8 October 2007
Part 1 of this four-part series introduced the Adobe Coldfusion 8 Server Monitor, with a focus on valuable features for CFML code development. In this second part, I'll focus on using the tool in production. The Server Monitor brings unique benefits and solutions to challenges related to performance and configuration. It's worth repeating: Whether you're a CFML developer or a ColdFusion Server administrator, you can find tremendous value from the new ColdFusion 8 Server Monitor. I hope to show you in this series that the Server Monitor (and its related features) is much more than just something to leave turned on in a network operations center to view availability charts and graphs. It provides unprecedented insight into underlying operations of the ColdFusion server.
In part 1, I introduced you to the Monitor, showing such basics as how to start the Monitor and how to use the Monitor interface. I also discussed the three Start buttons (monitoring, profiling, and memory tracking), and I made the very important point that you can get value from the Monitor even if you enable none of these buttons. Even so, each option enables additional functionality and information reported in the Monitor, and I highlighted several of the reports (and drill-downs and charts) that provide info that could help during CFML development, such as tracking shared scope memory utilization, slowest tags or function calls, largest variables in a request, JVM memory usage, cached queries, and finally large, slow, and frequent queries.
In this article, I'll introduce additional features of the Monitor, mostly focused on additional monitoring reports that can be especially valuable in managing a production ColdFusion environment. Finally, I'll discuss a powerful new ability to abort troublesome requests or threads to recoup valuable system resources.
Let me reiterate as well: The split of functionality I offer between development and production functionality in the Server Monitor is rather arbitrary, as features discussed here apply just as usefully to development, and vice versa. The next part of this series will cover additional features of value to production administrators: Alerts and Snapshots. The final part will cover the MultiServer Monitor and a few miscellaneous topics related to the Monitor.
Check out the other parts of the ColdFusion 8 server monitoring series:
If you're responsible for a production ColdFusion server, you're well aware of how various circumstances can negatively affect performance. These circumstances could include an increase in traffic (good news, which must be managed) or the adverse impact of a failure in some related service your ColdFusion application is calling upon, such as a database or a web service being called (bad news, which must be handled). The circumstances could also be due to problems with the configuration of your server (whether ColdFusion itself, the underlying JVM, or other related services like a database or the web server). The problem may also be caused by CFML coding practices, and in the previous article I showed how to use the Monitor to assist during development, giving developers more insight into the impact of their practices before code is deployed into production.
Still, if you fail to adequately load test your code in development (or better, against a testing server configured similarly to your production server and with a load commensurate with your real-world traffic), it's almost inevitable that problems will occur in the production environment that you never saw happen in development/test environments. This is where the Server Monitor provides a valuable weapon in your arsenal to find and defeat the traffic, configuration, or coding foes that may vex your server. First, let's look at a number of reports available in the Server Monitor that can help identify and analyze the cause of production performance or other problems.
Just as I showed you in part 1, some of the available reports (and charts) incur zero overhead, meaning you don't need to enable any of the three Start buttons atop the monitor, for monitoring, profiling, or memory tracking. I had discussed how the first two of these have relatively minor overhead, while the memory tracking option is quite intensive and really should be used only in development. But several reports work even if none of these are enabled, so they're effectively "free." There's absolutely no reason to fear using them in production, as they're simply reporting information already being gathered by ColdFusion 8, so let's take a look at them first.
I mentioned in part 1 that we finally have a "Query Cache Status" page, to give you insight into how many cached queries are in use (also as a zero-overhead report). I mentioned it in part 1 because developers could use it to determine how best to code their applications to use cached queries, though certainly administrators will want to monitor the report as well, to help configure the ColdFusion Administrator console setting for the "Maximum Number of Cached Queries" (see Figure 1).
Along the same lines, another cache that's configurable on that same page is the "Maximum Number of Cached Templates." ColdFusion administrators have long struggled with conflicting recommendations about how best to set this value. We simply had no insight in ColdFusion MX as to how much of the template cache was really being used, nor, perhaps as important, how much memory was being used to hold those cached templates. The Server Monitor finally gives us that insight, via Statistics > Request Statistics > Template Cache Status (Figure 2), which provides both the count and the total size of the cache. Again, it comes with zero overhead.
Since ColdFusion 7, the Administrator console has offered a button to "Clear the Template Cache Now" (see bottom of Figure 1), and you can click it to see the impact on the Server Monitor report (there may be a slight delay in reporting the information). Of course, clearing the cache may be inappropriate to do in production, as it will force a recompile of all templates as they're next requested. My point here is that it simply demonstrates the connection between clearing the cache and the reporting on it.
(Here's a bonus tip: experienced administrators may be familiar with that button when using the Trusted Cache feature, also on that Admin console page. If they have some newly updated source code files to push to production, they could use the button to empty the template cache and force ColdFusion to force these (and all) templates to be reloaded/recompiled from disk. Some may know there is also an Admin API method call available to clear the template cache, but did you know that in ColdFusion 8 that method now takes an optional file name or list to clear one or more specific files rather than the entire cache? For more information on managing the template cache, refer to the ColdFusion 8 documentation on the CFIDE.adminapi.runtime.cfc and its
I mentioned in part 1 that you can now view all currently active sessions through the Statistics > Request Statistics > Active Sessions report. The report even lets you drill down to each session to see the variables set in each session
But I neglected to mention a real hidden gem: this report also has a chart icon (the rightmost of the icons displayed at the top right of the report), which displays a chart of how many current sessions are active over time, which can be very helpful to a server administrator (see Figure 3).
The chart is very useful, as it could help a production (or development) administrator detect when something is causing a very high number of sessions to be created. It's easy for folks to assume that the number created would be approximately similar to the number of users typically visiting the site, but there are requests that can create new sessions unexpectedly.
Consider this potentially dangerous situation, wherein you have code that's being visited heavily by any of several kinds of requests that are not typical users: search bots, RSS readers, scheduled tasks, web service calls, CFHTTP requests, load testing tools (with cookie support disabled) and so on. If any of these run code that creates sessions (meaning simply that the
SessionManagement attribute is set to "yes" in the
CFApplication tag or the Application.cfc), there's a potentially serious problem: These kind of requests don't typically remember any cookie you may have set on their first visit (like a regular browser would). As such, if they come back to get another page right away (or even in a few minutes), each visit will create a new session (and a new set of entries in the client variables repository if
ClientManagement="yes"). If such requests are made often within your session variable timeout timeframe, you'll likely find that there are many more sessions being created than you ever dreamed. This could be one explanation if your server's been running out of memory and dying, and this report helps spot such trouble.
Speaking of errors, the Monitor provides us with additional error information that was previously available only through the ColdFusion error logs. The Statistics > Errors > Requests with Errors report lists requests that have generated ColdFusion errors and provides detailed information about those errors (Figure 4). If you have enabled ‘Profiling', you can see more about the line of code in error, how often the error has occurred and the full CFML stack trace that led to the error (see Figure 5). This is especially useful in production, where you often otherwise may have no idea that errors are happening (if you're not monitoring the logs), and with the additional information provided when profiling is enabled makes up considerably for the (typical) lack of debugging info in production.
The information on this page also feeds a pod on the front Overview page. In the lower right (as shown in Figure 1 in part 1) is a Last Error pod.
Before moving on to other reports that are enabled using one of the Start buttons, I refer you again to Part 1 for a discussion of the (JVM) Memory Usage Summary, Application Scope Memory Usage, Server Scope Memory Usage, and Query Cache Status reports, all available with zero overhead.
Perhaps the most traditional measures of server performance are the average response time and requests per second being processed by the server. These are indeed the very first graphs shown on the front page of the Monitor (the Overview page). If you have "Start Monitoring" enabled, then those first two charts will be populated with a line graph depicting each of these vital statistics over time (Figure 6):
On my lightly loaded development system, the reports aren't terribly interesting, but in your production environment this information will be critical. (As on many reports in the Server Monitor, note the small icon to choose the reporting timeframe in the top right corner, where it's labeled "All Data" by default, as shown in the figure.)
Another classic monitoring report (available in similar monitoring tools) is one to show you all currently executing requests. You can get that in the Server Monitor with Statistics > Request Statistics > Active Requests. You will need to enable “Start Monitoring” to see any requests here. This information can be valuable in understanding why your server is not responding or responding poorly, as well as in helping configure ColdFusion's simultaneous request settings. Again, on my lightly loaded development system, it was even hard to catch a request in the act of being executed (as some would note ColdFusion 8 is so fast!), but see Figure 7:
The line of information indicates the template file/path, as well as the type of request, the IP address, web server thread name, and time taken so far to complete the request. You can learn still more about the request by double-clicking to drill down on it. As I had mentioned in part 1 (about the Slowest Requests report, another useful report enabled by "start monitoring"), you can drill down into such request lines to see details of the variable values in all the scopes of the request (see Figure 3 of Part 1.) Note that this is without "start memory tracking."
There are also corresponding Active Threads and Slowest Active Threads reports, which show activity from threads generated by the new ColdFusion 8 tag,
CFThread. I think it's worth clarifying that the Slowest Requests and Slowest Threads reports do not reflect requests or threads that are currently running, but instead show statistics on completed threads. Finally, speaking of the "slowest active requests," I'll point out that there is also a pod on the Overview page that reflects this data as well, though note that it only shows those in a relatively brief recent timeframe, whereas the "slowest requests" report is much more comprehensive.
Before moving on, I also want to point out the small red "x" icon on Figure 7 (the Active Requests report), to the left of the request. That's a powerful feature enabling you to abort a request, which I'll discuss in the next to last section of this article.
Here's one more hidden gem: just as the Active Sessions report offers a chart button to view active sessions over time, so too does the Active Requests report (Figure 8). This is different, though. It's not really tracking currently "active" requests so much as it depicts how many (and of what type) were active at points of time in the past. The report also breaks down whether the requests were queued or running at the time. The "types" referred to are template requests, web service requests, flash remoting requests, and remote CFC requests (from a URL).
Note that this report, like all the charts shown so far, also has popup data: If you mouse over datapoints on the chart, it will show the values for that datapoint (the time and number of requests running or queued of the selected type). Since this report offers multiple series of data interleaved over top of each other, this pop up data may be more valuable here than in previous charts.
This is another report that doesn't require any of the start buttons to be enabled.
Some may wonder if there is a report in the Server Monitor that simply shows a historical view of all past requests or threads (with their request URL, the time they executed, their duration, and so forth). There is not. The closest to that is the Slowest Requests report.
While we can't get historical detail about every request, it's usually more useful not to learn the details about every request but instead to focus on those requests that are the "heavy hitters." The Server Monitor steps up to the plate to provide two such reports.. The first report displays requests using up a lot of time (in terms of response time) over all the executions of that page in the life of the ColdFusion server: Statistics > Request Statistics > Cumulative Server Usage report (Figure 9).
From the screen itself, this report "lists requests that have cumulatively consumed the most server time. Cumulative server time is computed as the average response time for a request multiplied by its hit count. Even if a request executes rapidly, it may consume a high proportion of server time due to a high hit count." Remember Pareto's principle: You may find that 20% of requests use 80% of resources. Go attack these heavy hitters. Note that there is also a chart button to present a graphical depiction of the heaviest hitters by that measure.
Or, from a different perspective, you may care more about which requests simply have the highest hit count. That's available with Statistics > Request Statistics > Highest Hit count report (Figure 10).
If you enabled "start memory tracking," the column on the "Average Request Size" column on the right of the report will be populated.
Also, this report has an available chart button, but like the Active Requests report, it doesn't really chart the same data as the report. It presents a cumulative chart tracking the percent of requests by type (again, template, flash remoting, web service, remote cfc, and adding gateway). Both the Cumulative Server Usage and Highest Hit Count reports require that "monitoring" be enabled.
So far, all these reports have been either "free" or needed only "start monitoring" enabled. Two more reports that may interest you in production require "profiling" to be enabled. In Part 1, I discussed three such database-oriented reports: Slowest Queries, Cached Queries, and Most Frequently Run Queries.
The final report I want to discuss is Statistics> Database> Active Queries report. Like the "Active Requests" report discussed above, it tracks queries that are currently executing (or maybe a better way to put it is: The queries ColdFusion is waiting for to be executed by the database.). Certainly, in a heavy production environment, this will be an essential report (along with the Slowest Queries" and "Most Frequently Run Queries"). Even when I've disabled caching in my application, ColdFusion and the database simply execute my queries too fast to ever show on the report, but I want to let you know it's there. Finally, there is also a Database Pool Status report (which requires no "start" buttons be enabled), for those interested in tracking database connection pool status and information.
I mentioned earlier regarding the Active Requests report that there was an option, to the left of a running request, to choose to abort it (see Figure 7). ColdFusion Administrators finally have the power (and commensurate responsibility) to be able to abort (or "kill") a running request. When a given request is inappropriately tying up resources, or a large number of them have caused the server to hit its maximum simultaneous requests limit (as defined in the ColdFusion Administrator) and they are taking a long time to process, the server may begin to respond very sluggishly or stop serving new requests at all. With the ability (using all the previously discussed reports) to target where problems may lie, you may decide it's time to take matters into your hands and abort a request manually.
You'll be prompted to confirm that you really want to abort the request. The request will be terminated and the user will be presented whatever portion of the request had been generated prior to the abort.
This is a major paradigm shift in how we manage ColdFusion servers. In previous versions, if the system lost resources or requests were infinitely paused, the only recourse was to restart the entire ColdFusion service. These service restarts take your application completely offline and users are inconvenienced by the loss of session data. The ColdFusion 8 Server Monitor provides system administrators with the information they need to pinpoint problems and target individual requests for possible termination. ColdFusion will regain the resources from the terminated request without affecting the overall application or experience of your users.
As powerful as this is, you may wonder if you have to baby sit the monitor to watch for and respond to such troublesome requests. No, you do not. In the next installment of the series, I'll show you the Alerts feature, which you can enable to watch for and detect various error situations (like too many hung threads, or requests taking too long), and, more than just send an alert, you can abort offending requests (and/or reject new ones, and more).
In this article, I've discussed features of the Monitor generally related to production monitoring, including more reports (both those with zero overhead and those that require the lightweight monitoring or profiling) and the ability to abort unresponsive or troublesome threads.
As discussed in part 1, the Server Monitor provides online help and it's also documented in the ColdFusion 8 manual, Configuring and Administering ColdFusion.
In the next part of the series, I'll focus on various additional aspects of using the monitor in production, the very useful Alerts and Snapshots features (which help with off-line monitoring of your servers). In the fourth and final part, I'll conclude with discussions of the MultiServer monitor (key if you need to manage more than one server), the Admin API (enabling you to access all this monitoring data programmatically), and various Monitor configuration settings (including how better to monitor requests for frameworks or other front controllers where all requests go through a single index.cfm).
Check out the other parts of the ColdFusion 8 server monitoring series: