Well Managed Visible Caching - The Ultimate Way To Improve Performance
Often, a main governing factor in web application performance is where the data resides which a user needs to get to produce a view in their browser and its size of course. This article will explain why caching is so critical in ensuring we continue to maintain acceptable service levels whilst not restricting business and academic progress. There is no magic nor mystery in any of this, in fact I often think of plumbing as an example.
A slow shower at home is not improved at all by increasing the size of the water tank, nor the size of the showerhead if the valves and tanks between tank and showerhead are a bottleneck because they are corroded or too small, the result is still a slow shower stream. Back in January of this year I introduced the concept of data distance. My contention in that article is that data of all kinds needs to be nearer to the user all the time, ongoing. Typically, sitting on a hard-drive, in a database server is as far away as it gets and on the users system is as close as it gets to the user. There is another scenario which could make things even worse, many web sites have affiliates which they communicate with one way or another, Web Services, Rest etc. In that case, not only can user experience be affected by data distance on the web site they are visiting but also by data residing on other sites that they did not chose to visit. In some cases this is really compounded because the content the user did not request is some sort of spammish ad content, that is a different subject though, in this article we are concerned with data distance and how it impacts performance.
Let us consider what the effects-risks are of data at it's maximum distance from the requesting user, residing on a hard-drive on or attached to a database server in a NAS or a SAN. In my 16 years of working with data and 10 years of helping clients to construct efficient web sites or to troubleshoot why web sites are slow, I have seen one thing that affects performance badly over and over again, it is this. The more "hops" across networks that data has to travel the less efficient the transmittal of that data is. There are two main reasons for this, firstly at each communication point between devices/servers there is either an Ethernet or fibre optic port. In most cases I have seen all data, coming and going, passes through the same port a NIC for instance. The second item that can and often does cause problems relating to networks is caused by the need to divide networks into segments by having hardware and hardware/software devices (routers/switches) that connect two network segments together. Although it is easy to forget in these mythical days of Virtualization and it's more attractive cousin CloudComputing; we are all on one gigantic network, that is exactly what the Internet is, an immense Wide Area Network.
After this, perhaps, long winded exposition, one that I felt was necessary, we return to the main thrust of this post. There are some circumstances I encounter where even after a good deal of garbage collection tuning we hit a ceiling where performance is still not optimal and this is often the case with ColdFusion when heavy use of CFC's are present. There have been several good blog posts around this subject. Beyond these issues the really large web sites like FaceBook, Twitter etc with upwards of 100 million unique visitors a day hit a plateau where no amount of physical servers are enough, Twitter alone has over 1,000 physical servers. They partly solved their immense traffic problems by using large distributed caching mechanisms based mainly on memcached . This is certainly a powerful core mechanism but requires a large amount of in-house API-Reporting coding to fully benefit. So after the O'Reilly Velocity conference in San Jose, which was an eye opener for Dan Wilson and I in terms of improving and maintaining performance on very large web sites, we started investigating ways of bringing really sophisticated caching to ColdFusion applications in particular. The main thing we were looking for was somehow to reduce the load on the web-CF servers which is exactly what memcached does. We actually found one of the attendee companies at Velocity has a product called "aiCache" and commenced a dialog with them and have now actually become the subcontracted engineering team for CFML-ColdFusion. This was after several weeks of deep delving into their offering and the last thing I want to do is make this blog piece sound like a sales pitch. The aiCache product runs on many 64-bit Linux distributions and I believe our years of knowledge working on ColdFusion applications will ensure that those needing help to use aiCache with ColdFusion will have a team available that knows ColdFusion well.
This is the first of two blog posts on this subject the next will go specifically through external caching methodologies as they pertain to ColdFusion and CFML in particular. Rob Brooks-Bilson and Ray Camden have done some good pieces on what I will call internal caching in ColdFusion, which also has relevance of course, however if we can reduce the number of pass-throughs of requests to ColdFusion we will improve performance overall quite dramatically. Management of and reporting on any cache is the key issue to effective usage, more to follow on this subject.