Aug
12
2008
Server Monitoring - Take Care Of Server Intrusion
Posted by Mike Brunt at 8:29 AM
14 comments - Categories:
ColdFusion | JRun-J2EE
I have had several clients recently who have had issues which appear to be related to inopportune use of server monitoring. We were all delighted when Adobe announced their server monitoring capabilities for ColdFusion 8 yet it became apparent that leaving on "memory tracking" in production could cause significant issues, I use this as an example of inopportune use of server monitoring. Recently I had a client who's servers would suddenly go off line once they reached 10 threads in use, even though there appeared to be no great state of distress overall, free memory was fine. This was not related to the server monitor in ColdFusion 8 but to a crash protection capability in another product.The main point of my post here is that in my opinion and experience with many, many clients server monitors should do nothing more than just that, monitor the server. There is no magic bullet in any product that I have ever seen that fixes underlying issues. In fact the more data a product offers often the more obfuscated the actual cause of problems. If we are allowing something to shut-down our servers that is not a solution but a band-aid and I personally would not be trusting something to shut-down my production instances.
The perfect product for me would give me the ability to watch request times, threading and memory use along with SQL passed along the JDBC driver in real-time. Also to have something that allows the setting of warning thresholds to notify me of high thread or low memory situations and also have the capability to save this information along with slow request information with the query string details to a log and database. Lastly, something that gives me a plain, understandable translation of the meaning of critical stack traces.
That is it for me, that is enough to see and analyze any problems I have ever encountered for all the clients I have ever helped.
charlie arehart wrote on 08/12/08 5:13 PM
Mike, rather than leave people to guess, can you detail what you mean about the problem of "crash protection capability in another product" causing a server to "suddenly go off line once they reached 10 threads in use"?Since you're not talking about CF8's alerts (a feature they have akin to crash protection), and I don't think SeeFusion offers something akin to it, I suspect you're referring to FusionReactor.
But I can't think what CP feature could cause the server to go offline once 10 threads are used. It could send a notice, or it could abort threads, or it could queue future threads while the problem remained. I suppose if someone did the latter and didn't expect it, the queuing of requests could seem like "going offline". But hey, powerful tools come with responsibility. Let's not blame the tools for being misused.
Now, the "memory tracking" of CF8's monitor might be in a different class: it really can kill a server under even moderate load--but then judicious use of oft-missed filtering features could temper that.
But you conclude "If we are allowing something to shut-down our servers that is not a solution but a band-aid and I personally would not be trusting something to shut-down my production instances." Again, what are you referring to that's "shutting down" the server?
I wonder if instead you're referring to FusionReactor's Enterprise Scripting feature, which allows a monitoring server to control a monitored server if it detects it becoming unresponsive. [Some may say, "buy, Charlie, you sound defensive, jumping to the conclusion that it's FR". I'm just using my familiarity with the features of the 3 tools to sort through what wasn't said. :-) ]
In that case, yes, it can shut down a server that it deems has become unresponsive (more at http://www.fusion-reactor.com/fr/featurefocus/monitoredServerScripts.cfm). Again, though, it's just a tool. Many have long used home-grown or third party features like this (ping tools, which will restart the CF service if it fails to respond.) As with the CP abort/queue features, all such tools do need to be used carefully.
Let me be clear: I, too, tend to stay away from the features that either abort or queue requests, or restart the server, at least when first using a tool. I'd much rather use them the way you describe, to gather diagnostic info. And indeed, the main points of your perfect tool are what all three of the major tools do.
I just fear that it's possible someone reading your note could conclude that the "other unnamed" monitor tool is something to fear and avoid. That would be an unfortunate conclusion. All 3 of the major CF monitoring tools have their strengths and weaknesses, and all can be configured to have very low overhead and impact, just serving to provide info to help make better diagnosis and analysis.
But you go on to say that such tools "should do nothing more than just monitor the server". Well, that's certainly a philosophical difference among the tools. Even CF8's monitor can do more than just monitor, in its alerts features. People should be careful with them, yes, but it seems a bit much to assert that the tools shouldn't offer the feature at all.
Still I agree that it's good and fair to warn people (especially newcomers to a tool) to be careful of the more powerful features, and for that I'm glad to see you offer your thoughts here. Always thought-provoking! :-)