Table of contents
3 December 2007
The first two parts of this four-part series have introduced the charts and graphs of the Coldfusion 8 Server Monitor, first focusing on those of value during development (Part 1: Using the Server Monitor in development), then those most useful in production (Part 2: Using the Server Monitor in production). The end of Part 2 also introduced the ability to abort troublesome requests.
Part 3 supplements the ability to manually abort requests by using a more automated approach, called the Alerts feature. Alerts aren't just for terminating troublesome requests, you can also use them to provide useful diagnostics by informing you of potentially troublesome requests. Another useful diagnostic tool in the Server Monitor is its Snapshots feature, which helps with off-line monitoring and analysis of your servers.
In Part 4, the final part of this series, I will cover the MultiServer monitor (which is key if you need to manage more than one server), the Admin API, and a few miscellaneous topics related to the Monitor.
Check out the other parts of the ColdFusion 8 server monitoring series:
Many people think of a system monitor in terms of its graphical interface, with charts, graphs, and reports that reflect the current status of the system and its components. As valuable as those are, you'd need to be watching that interface to know when a problem situation occurs.
What if you could instead receive notification of a problem situation by e-mail or, perhaps, execute a ColdFusion component (CFC) to do any sort of processing (write data to a log file or database, and the like)? And what if you could also specify that the system should generate additional details about the current state of the system (a new feature called “Snapshots” in ColdFusion 8), to help better diagnose the problem? The Alerts feature of the ColdFusion 8 Server Monitor offers this and more.
More than just alerting you about problems, the tool also offers the means to manage request processing, including options to terminate (kill) running requests, reject any new requests during the problem state, or even perform a Java Virtual Machine (JVM) garbage collection operation. While the ColdFusion Administrator has long offered a means to "timeout requests" that ran too long, the alert mechanism takes the functionality that much further, bringing a substantial new dimension of "unattended" monitoring.
Alerts can be created to detect, report on, and respond to four kinds of problem states:
- Unresponsive Server (too many requests taking too long)
- Slow Server (average response time too high)
- JVM Memory (too much memory used)
- Timeouts (too many requests timing out)
Each of these problem states will be detailed later in this article.
Note: It’s important to note that alerts will only be triggered both if they are enabled (discussed in the next section) and if you have selected the Start Monitoring option in the Server Monitor, as was discussed in Part 1 of this series. Although one of the alerts relates to JVM memory, that alert does not require the “Start Memory Tracking” button to be enabled.
Alerts present quite a significant paradigm shift in the management of ColdFusion servers. If set up properly, ColdFusion could conceivably never go down/offline. Previously, when something went wrong, you may have been forced to restart ColdFusion. Now, alerts can notify administrators of a problem, create a snapshot of the environment to help determine the source, and even automatically fix the problem by killing threads, calling garbage collection, rejecting new requests, and/or executing custom code. Each of these features is discussed in more detail, later in this article.
Unlike the previous two articles, whose features were accessible through the Overview (main page) or Statistics tab of the monitor interface, configuration of the Alerts feature is done through its own tab (the Alerts tab), shown at the top of the Monitor. Once selected, this offers two links on the left navigation bar. The first page shows any current Alert notifications (discussed later). Clicking the second link, Alerts Configuration, shows a page that allows you to create or edit new alerts (see Figure 1).
Figure 1. The Alerts Configuration page
On the Alerts Configuration page, there is a tab for each type of alert that you can set. For each tab and page, the first option is a check box to enable the Alert option. Until you select the enable the Alert option, you will be unable to select any of the other options for Alerts (see in Figure 1). Once you select the enable the Alert option, you can set the threshold at which the alerts will be detected, and indicate the actions that ColdFusion should take during a problem state, and so forth.
The actions are the same for all the alert types with one variation. You can choose to:
- Send e-mail (to one ore more e-mail addresses specified in the last tab, Email Settings)
- Dump a snapshot (Snapshots are a depiction of the current state of the system)
- Kill threads running longer than a specified number of seconds
- Reject any new requests
- Run a processing CFC
- Perform garbage collection (only with the JVM Memory alert)
I’ll discuss each of these actions later, after discussing the types of alerts. Changes made on this screen take effect immediately. You don't need to restart the ColdFusion server.
The following are the types of alerts and their available threshold settings.
The Unresponsive Server alert detects too many requests taking too long. It offers two threshold values: Hung Thread Count and Busy Thread Time (in seconds). If the number of requests specified by Hung Thread Count are detected to execute for longer than the Busy Thread Time, the server is considered unresponsive and this alert will be triggered. While the Request Timeout setting in the ColdFusion Administrator (Server Settings > Request Tuning) sets the maximum time any single request may be allowed to run, this setting triggers when some number of simultaneous requests exceed a given response time, giving finer control.
Also, there are some operations that can’t be immediately interrupted by the Request Timeout feature (discussed later), so this alert can also serve to back up that setting to notify you of requests exceeding that expected timeout.
You want to avoid having so many threads become unavailable for so long that eventually the server becomes unable to process new requests. This alert can warn you (by e-mail) when you have reached this state, or you can choose to attempt to terminate threads, reject new requests, take a snapshot, or run a CFC. If you take a snapshot, it lists the threads which are detected to be running too long, in addition showing a stack trace (or “thread dump”) of all running threads. Snapshots (and stack traces) and other alert actions are discussed later.
The Slow Server alert detects when the average response time for processing requests reaches a specified threshold. It offers a single Response Time Threshold (in seconds). This is compared to the average response time for all requests, as computed over an interval configured in the Server Monitor’s settings page (at the top right of the Server Monitor, as discussed further in Part 4 of this series.) The current average response time is displayed in the Average Response Time chart on the Server Monitor’s Overview page.
If the average response time of currently running requests is greater than the threshold time, the alert is triggered, with the same available actions as for the Unresponsive Server alert. (Note that if a snapshot is taken, it does not list the threads running, though it does show the stack trace.)
The JVM Memory alert detects when ColdFusion is using a certain amount of RAM. If the JVM memory used by ColdFusion is greater than the threshold value (in megabytes), a JVM Memory alert is activated. Consider a suitable value with respect to the Maximum JVM Heap Size, which you can set in stand-alone deployments of ColdFusion through the Administrator (set in Server Settings > Java and VM page), or in the jvm.config file for multiserver and J2EE deployments. You want to avoid a situation where the JVM memory use grows so large that you reach an out-of-memory condition. This alert can warn you when you are approaching this state. When triggered, this alert can take the actions described so far and can also be configured to perform garbage collection, as discussed later.
The Timeouts alert detects when too many requests are timing out. It offers two threshold values: Timeout Counts and Time Interval (in seconds). If the number of requests specified in Timeout Counts time out within the time interval specified by Time Interval, a Timeout alert is triggered. These timeouts are triggered by the Request Timeout feature in the Administrator (Server Settings > Settings). While it's helpful that ColdFusion can time out requests, you can use this alert to let you know when it's happening too often, as well as to take most of the aforementioned actions to maintain server stability. (A snapshot, if taken as an action, does not list the requests that timed out, but you can find more information on the timed out request(s) in the logs in ColdFusion’s runtime/logs directory.)
If one of the enabled alerts is triggered due to exceeding the threshold value, there are three ways that you can observe the alert notifications.
First, any alerts triggered will be displayed in the first page of the Alerts tab (see Figure 2).
Figure 2. The Alerts Notification page
Each alert message should eventually be followed in time by another alert message indicating when the server has recovered from the problem state. The recovery message will indicate if any actions were taken during the alert, including how many requests were killed, whether requests were rejected, and so forth.
If an alert has caused the creation of a snapshot, discussed in the next section, an icon will display to the left of the alert, as shown in two instances in Figure 2. Notice that you can also delete either an individual alert notification or all of them by using the buttons at the top of the page.
This display of Alert information will only remain in the Server Monitor as long as ColdFusion is running. Upon restart, the information is cleared. But it's not entirely lost, as the very same information is tracked in available log files.
Another way to view alert messages is in ColdFusion's log files. Note that I say "files," because alerts are actually tracked in two different files (though the same alert information is offered in each).
First, the information is written to a monitor.log file in the traditional ColdFusion logs directory, such as C:\ColdFusion8\logs in the standalone edition, or C:\JRun4\servers\cfusion\cfusion-ear\cfusion-war\WEB-INF\cfusion\logs for the default server in a multiserver deployment.
The same alert logging data is also written to the ColdFusion -out log, along with a considerable amount of other logging information that has been traditionally written to that file. In the standalone deployment of ColdFusion, the location of the – out log file would be C:\ColdFusion8\runtime\logs\ (as coldfusion-out.log). In the multiserver deployment, it would be in C:\JRun4\logs\ as cfusion-out.log for the default server, or replace the "cfusion" portion with the name of any other instance or instances you may have enabled.
Still another way to see alert notifications is by way of e-mail, discussed in the next section on available actions.
For each kind of alert, there are several available actions to take when the alert is triggered. Each of these is described below.
Note: You can enable these actions either before or after an alert has been triggered (in other words, before it has recovered).
If you select the "Send E-mail" action for any of the alerts, an e-mail will be sent when an alert is triggered and when it recovers, if you have configured an e-mail address in the Email Settings tab of the Alert Configuration page.
Note: The Email Settings page requires that you have configured the mail server settings in the ColdFusion Administrator (in the Server Settings > Mail page) to set an SMTP server for sending e-mails from ColdFusion.
You may specify multiple e-mail addresses by separating them with commas. Semicolons will not work, though you won't receive an error message.
You can confirm that e-mails are being sent by viewing the aforementioned log entry in the monitor.log file. Each alert that fires, which has been set to send e-mail, will report if it did or did not send an e-mail (by adding "Email notification sent" or "Failed to send email notification" to the log entry).
The alert notification e-mails come from an address of cfadmin@[servername], where servername is the name of your server. This is not configurable. If you find you are not receiving the notification e-mails, and you've confirmed that the monitor.log shows it did successfully send an e-mail, your mail server may preclude sending out e-mails with a From address that has a domain name other than that of the mail server.
If an alert is set to kill requests taking longer than a given period of seconds, the e-mail notification for that alert will also list those requests that were killed. (This is not displayed in the notifications page, nor in the snapshot or monitor.log file.) Sadly, the e-mail will not report what requests are running that trigger other events (such as those running too long or when the average response time is too long). But you can capture that information, and a lot more, in the available Snapshots feature.
If you select the Dump snapshot action for any of the alerts, ColdFusion will generate a “snapshot” file when the alert is triggered. This is a text file that you can read, which contains considerable information on the status of the ColdFusion server and currently running requests, threads, and queries. The snapshot file created can be viewed on the Alerts page, discussed later, which displays any alerts that are triggered. Note also that if you have chosen the Send E-mail action as well, the snapshot file will be included as an attachment in the email.
Besides requesting a snapshot with an alert, you can also request one manually using the available Snapshots tab within the monitor. Since that’s discussed later in this article, I’ll save further discussion of snapshots for that that section.
If you select the “Kill Threads running longer than x seconds” action, then while an alert is triggered, ColdFusion will attempt to kill any requests whose response time exceeds the number of seconds specified. In most cases, any such requests will be terminated. (Note that this time after which long-running threads will be killed is separate from the time for triggering the alert, if the alert is time-related, and it’s also overrides the request timeout in the CF Admin.
The user will generally see whatever text was being generated prior to the point in the code where the termination occurred. They may also get an error message, which will vary depending on the operation that was interrupted.
Note that there are some kinds of operations within requests that ColdFusion can’t interrupt immediately, such as during requests to databases (called from
CFSTOREDPROC, and so forth),
CFHTTPoperations, or invocation of a web service, to name a few.
In such cases, the request will be terminated, but only after the blocked operation completes—which means that if the remote service (database, web service, etc.) is what’s causing the delay, the attempt to kill the request will have to wait at least as long as that remote operation takes to complete (or upon the indicated termination time for the specific operation, such as if the
TIMEOUTattribute is used on
CFINVOKEof web services.) This applies as well to the manual kill feature discussed in Part 2 of this series, as well as the Request Timeout feature in the ColdFusion Administrator.
If you select "Reject new requests," if an alert is triggered (until it’s recovered), new requests will be rejected immediately upon execution. They will receive a 503 status code, and may see a message, “The server is unable to process your request. Please try again later.” This will not affect the execution of requests already running when the alert was triggered.
The ability to reject any new incoming request is a pretty significant change in operations. Otherwise, with a high traffic site especially, when a problem occurs, requests keep coming at their normal pace. These request normally get queued up and will execute when ColdFusion has the necessary resources. This can create a vicious circle; when ColdFusion is again able to recover after a problem state (whether by finishing or timing out requests, allocating more memory, and so forth), it's flooded by all the requests ready to execute. ColdFusion is stuck playing catch up and the server could appear to still be offline or sluggish.
So, rather than try to service all the requests that come in during an alert, ColdFusion can instead reject the new requests. Again, the users experience what appears to be an error message, but at least their requests do not pile up, thus preventing a worsened error state on the server.
Still another powerful new feature for ensuring longer uptimes is the alert action to Perform garbage collection. Available for the JVM Memory alert only, this will cause ColdFusion to make a request to the underlying JVM that garbage collection be performed on memory. A discussion of garbage collection is beyond the scope of this article, but briefly, when CFML requests run, the memory used to perform their processing will be allocated and then generally be marked for reclamation at the end of the request. The underlying JVM should automatically remove (“collect”) that no longer used memory (“garbage”), but sometimes it may not do so until a garbage collection is requested.
You can view the amount of memory used by ColdFusion in the graph shown in the Server Monitor’s page, Statistics>Memory Usage>Memory Usage Summary (where you can also run a garbage collection manually), as discussed in Part 1 of this series. If the garbage collection request is successful, the amount of used memory may drop so as to allow ColdFusion to recover from this alert.
When triggered by the alert, garbage collection will be attempted every minute until the alert recovers. This is not configurable in the Server Monitor interface. The Alerts page will not indicate how many garbage collections are requested, but you can view this in the available monitor.log.
The final alert action is "Processing a CFC." With this option, you can arrange to perform any ColdFusion Markup Language (CFML) operation on the triggering (or resolution) of an alert state, to include storing data in a database, sending an instant message or SMS notification (if you've enabled the ColdFusion event gateways to support that), and so on.
As for the form field for specifying alert actions, the option for a CFC is a field that expects a CFC name (and extension). By default, ColdFusion looks for the CFC in its runtime\bin directory (C:\ColdFusion8\runtime\bin in the standalone edition).
Note: At the time of this writing, I've not found any way to indicate that the CFC is located in any other directory. (I tried both webroot relative and absolute paths, but neither worked. I do note that in the ColdFusion 8 Release Notes, this inability to use relative paths is listed as a known issue, though it suggests that absolute paths should work. Perhaps by the time you read this the problem will have been resolved.)
The CFC you create for this purpose must have two functions
onAlertEnd(), both of which accept a structure as an argument and return no values. The
onAlertStart()function is executed when an alert becomes active, and
onAlertEnd()is executed when the server recovers from this alert or this alert is invalidated.
In both methods, a structure is passed in, containing information about settings as to when the alert was activated (or was recovered or disabled.) The following is a sample alert.cfc that simply dumps the incoming struct (passed by ColdFusion) into a <
tag,which is then passed to the
<cflog>tag to be shown in the application.log file (in the [coldfusion]\logs directory.) Note the use of the
tagto make the dump more readable within the log file.
<cfcomponent> <cffunction name="onAlertStart"> <cfargument name="instruct" required="No"> <cfsavecontent variable="get"> <cfdump format="text" var="#instruct#"> </cfsavecontent> <cflog log="APPLICATION" text="#get#"> <cfreturn> </cffunction> <cffunction name="onAlertEnd"> <cfargument name="instruct" required="No"> <cfsavecontent variable="get"> <cfdump format="text" var="#instruct#"> </cfsavecontent> <cflog log="APPLICATION" text="#get#"> <cfreturn> </cffunction> </cfcomponent>
The keys in the structure include:
ALERTACTIVATE: the date and time that the alert was triggered
ALERTMESSAGE: the message corresponding to the type of alert triggered
ALERTSNAPSHOTFILE: the filename, if any, of a generated snapshot file
ALERTTYPE: a textual reference to the type of alert triggered
Some other keys in the structure that help in determining the alert's state are
ISRECOVERED, all returning booleans. When the alert recovers, other useful keys are
ALERTRECOVEREDAT, each of which would hold date/time fields (or the empty string).
If you have any trouble trying to use the CFC, the success or failure of trying to invoke it will be tracked in the monitor.log.
The second major feature discussed in this part of the series on the Server Monitor will be the Snapshots feature. Have you ever wished you could gather a list of all the pertinent statistics about the processing of your ColdFusion environment, such as how many requests are running or queued, or how much memory is free or used? It's true that this information is available in the Server Monitor interface, but what if you wanted that information stored in a file as of a point in time?
Some of you may know that this sort of information can indeed be gathered in log files by enabling something called JRun metrics, which (if enabled) are typically written at regular intervals to the same -out log file described in previous sections.
But what if you also wanted to see such details as the number of cached queries, the query cache hit ratio, and, optionally, details on each cached query and the total size in bytes of the query cache? Or information on each data source and its database pooling statistics? Or a Java stack trace (or thread dump) of the ColdFusion environment? This is information that just isn't provided in any ColdFusion logs, but it is available with the new Snapshots feature, which is a text file with all this information and more.
The previous discussion of available Alert actions mentioned that you could create a snapshot when an alert was triggered. However, snapshots are such a valuable feature that there's a separate interface in the Server Monitor, which can be used independently of alerts, to trigger and view snapshots. It has its own tab at the top of the Server Monitor interface (see Figure 3).
Figure 3. The Snapshots page
But the Snapshots page does list only those snapshots that are created manually using the Trigger Snapshot button on this page. As mentioned in the Alerts section, to view snapshots created as a result of an alert, you use the Alerts page in the Alerts tab.
Whichever list of snapshots you're observing, when you're ready to view one of them, you'll notice an icon displayed to the left of the snapshot. The ones in Figure 3, for instance, are indeed the same ones shown for the alerts that had snapshot icons displayed in Figure 2. Clicking that icon (in either page) will open a display of the snapshot in a new browser window.
The snapshot is, in fact, just a plain text file that may possibly have hundreds of lines of information, depending on the number of applications and users running in your environment at the time the snapshot is taken. A subset of that information is shown in Figure 4.
Figure 4. Display of snapshot details
Note: As mentioned before, regarding alert-generated snapshots, if you choose both Dump Snapshot and Send Email as actions when configuring an alert, the snapshot will be included as an attachment to the e-mail notification.
Most of the information in the snapshot is self-explanatory. The stack trace (or “thread dump”) is something perhaps new to some ColdFusion developers and administrators. This is a depiction of all the currently running requests (and other threads, including some running on behalf of the underlying JRun or other J2EE server, as well as threads doing work on ColdFusion’s behalf.) With respect to CFML pages, they also generally show a reference to the exact line of CFML code (and template path) currently executing at the time of the stack trace.
You can learn more about ColdFusion stack traces and thread dumps in articles such as http://www.adobe.com/go/tn_18339. Though this discusses obtaining them in a manual approach, the ColdFusion 8 Server Monitor makes this much easier, and the discussion of how to analyze and interpret stack traces will be very useful.
As in the case of Alerts notifications, this display of snapshots will only remain in the Server Monitor as long as ColdFusion is running. Upon restart, the information is cleared. But, as with the alert notifications, the snapshots are not lost either, as the snapshots are saved as text files.
The snapshot files are stored in a snapshots directory under the ColdFusion logs directory. In a stand-alone edition, that directory would be C:\ColdFusion8\logs\snapshots\. The file names used for these snapshots will begin with either snapshot_sysgen (system-generated) for those generated by alerts, or snapshot_usrgen (user-generated) for those generated manually in the Snapshots page. The remainder of the filename is a numeric representation of the current date and time of the dump. The file name for manually-generated snapshots is shown on the Snapshots page itself, while the filename for alert-generated snapshots is shown in the monitor.log file mentioned in the discussion on alerts.
As for managing user-generated snapshot files, the Snapshots page offers an icon next to each snapshot, which, if selected, will delete the snapshot file. Where the section on alerts also discussed how alert notifications could be deleted, that action would also delete any system-generated snapshot associated with that alert.
Snapshots can be useful in understanding the state of your system when problems are starting to happen, to offer you additional diagnostic information that may help resolve those problems. This can of course be useful for post-mortem analysis if a snapshot is taken before a server crash. It can be helpful, as well, to take a snapshot at a quiet time to provide baseline, comparative data.
Where to go from here
The Server Monitor provides online help and it's also documented in the ColdFusion 8 manual, Configuring and Administering ColdFusion.
In the fourth and final part, I'll conclude with discussions of the MultiServer monitor (which is key if you need to manage more than one server), the Admin API (enabling you to access all this monitoring data programmatically), and various Monitor configuration settings (including how better to monitor requests for frameworks or other front controllers where all requests go through a single index.cfm).
Check out the other parts of the ColdFusion 8 server monitoring series: