8 June 2009
A basic understanding of Java and JVM memory management will be helpful, as will familiarity with ColdFusion Administrator and CFML.
Performance is a key aspect of any application or server, and ColdFusion is no exception. Performance for Internet applications that many users can access is particularly important. Since ColdFusion is both a server and a language (CFML) there are multiple approaches to performance improvement. This article explores different ways to improve the performance of ColdFusion applications on ColdFusion 8.0.1. It comprises three sections on performance tuning with Java Virtual Machine (JVM) parameters, ColdFusion Administrator settings, and coding best practices, as well as a case study of the BlogCFC application to illustrate the performance gains that can be achieved by applying the concepts described.
Since ColdFusion is based on the J2EE platform, you can use JVM arguments to tune the performance of ColdFusion applications. ColdFusion 8.0.1 is shipped with Java Runtime Environment (JRE)1.6.
The Sun JVM supports several arguments that can be used to change basic behavior, tune performance, or change debugging settings. In this part I will discuss the performance tuning parameters provided by Sun Java SE 6. As examples, I will cover two applications built on ColdFusion, BlogCFC and Canvas Wiki, and provide recommendations on how to tune them using JVM arguments.
In Sun Java, memory is managed in generations, or memory pools having objects of different size. There are different kinds of generations:
You can use the following arguments to tune memory management in Java:
The following terms are used in the discussion:
For more details on generations, garbage collection, and arguments in Sun Java refer to the following resources:
GCViewer is the open source tool that I used to obtain garbage collection profiles and measure performance metrics such as frequency of collections, pauses, heap size, and so on. For more details on GCViewer visit GCViewer.
I also used the freeware Apache JMeter software. This tool is used to measure end-to-end performance, including average response time and throughput. For more information refer to Apache JMeter. It is also used to simulate a load on the server.
Figure 1 shows a garbage collection profile for BlogCFC with Xmx=1024m and 30 virtual users from JMeter running for 5 minutes. The purple area represents the tenured generation, yellow represents the young generation, and blue represents heap usage. The x-axis represents time and y-axis is a dual scale, shows JVM memory in KB and time taken by collector in seconds.
The profile in Figure 1 shows frequent major collections and frequent minor collections. Because major collections take more time, if you reduce the frequency of major collections then you can increase throughput. So if pause time is not an issue for your application, the following techniques will reduce the frequency of major collections:
As shown in Figure 2, doubling Xms decreases the frequency of major collections.
Figure 3 shows the throughput improvement in the BlogCFC application when Xmx is increased from 1024MB to 2048MB.
From a performance perspective, young generation collection is comparatively less expensive than tenured generation collection. Frequent young generation collection, however, can hamper performance. Also if collections are frequent then most objects remain alive making collection inefficient. When you decrease the frequency of young generation collection, then more objects will be dereferenced (or die) between two collections.
The following parameters can be used for young generation tuning:
On the other hand, if memory is limited then allotting more space to the young generation will reduce the space available for the tenured generation resulting in more frequent expensive major collections. As a first step, use memory profiling to analyze the usage of young and tenured generations. The profile in Figure 1 shows that the young generation is about 40% of the heap. In this case I would try allocating around 25-35% of heap to the young generation to see what percentage is optimal for the application.
Figure 4 shows a decrease in frequency of minor collection with higher Xmn value. Figure 5 shows a throughput improvement for a higher Xmn value.
Decreasing the survivor ratio will increase the size of survivor spaces. Larger survivor spaces allow short-lived objects a longer time period to die in the young generation (see Figure 6). If your application has more objects that die young then tuning this argument will be more effective. To learn more about generations and spaces, visit Tuning Garbage Collection with the 1.4.2 Java Virtual Machine.
Prior to Java SE 6, young generation collection was done in parallel while major collections were performed using a single thread. In Java SE 6 major collections can be done in parallel with the new UseParallelOldGC setting. This garbage collection will be enabled by default in JDK6.
For JDK5 update 6 and later you can enable this feature by adding the option
-XX:+UseParallelOldGC to the command line.
The results in Figure 8 show that BlogCFC performs better when using the UseParallelOldGC garbage collector. Applications that have frequent major collections typically benefit more from using the UseParallelOldGC setting.
The permanent generation is used to hold class objects and method objects. So, if any application has lots of classes to be loaded and unloaded then increasing the size of permanent generation will reduce loading and unloading of classes and thus increase throughput. To increase the size of the permanent generation use the following command line option:
The NewRatio argument defines the ratio of the tenured generation to the young generation. The default value is 2, which means the tenured generation occupies 2/3 of the memory and the young generation occupies 1/3. By profiling your application you can tune this value to improve performance. For example, if the profile of the application shows the young generation occupancy as 20%, you can define –XX:NewRatio=4, so that the tenured generation occupies 4/5 of the memory and the young generation occupies 1/5.
There are more performance tuning options provided by the JVM, but I did not see a significant performance improvement using them in my environment. For more details on performance options, see Java HotSpot VM Options.
ColdFusion Administrator provides options for tuning application performance. The effectiveness of the settings discussed below depends on a number of factors including the application itself, load, and number of CPUs.
You can use settings on the Caching page to manage how ColdFusion handles caches.
ColdFusion compiles CFMs and CFCs to Java bytecode and stores them in memory known as the template cache. On subsequent requests for the CFM or CFC, ColdFusion refers to the compiled template, but also checks whether the actual file has been modified or not.
When the Trusted Cache option is enabled in ColdFusion Administrator under Server Settings > Caching, ColdFusion will not check whether the file has been modified. Because this file system overhead is eliminated, performance improves. In production, if your application does not require automatic detection of template changes then enable the Trusted Cache setting.
Note: While enabling Trusted Cache can greatly improve throughput, there is a drawback. If a file is updated, ColdFusion will not automatically reflect the changes. You can manually clear the cache by clicking Clear Template Cache Now on the Caching page. This will force ColdFusion to reload templates into memory the next time they are requested and recompile them if they have been modified.
By default the template cache can store a maximum of 1024 cached templates. This can be changed under Server Settings > Caching using the Maximum Number Of Cached Templates setting. If your application has more static pages and your server has enough memory, then increasing this value can improve performance. If your application has a very large number of templates and you have increased the maximum number of cached templates, it is a good idea to increase MaxPermSize setting for the JVM as these templates will be stored in permanent generation in JVM. By default this setting is: -XX:MaxPermSize=192m.
Figure 9 shows the performance improvement for BlogCFC when Trusted Cached is turned on.
If you enable the Save Class Files option on the Caching page, then ColdFusion will save the class files that the ColdFusion bytecode compiler generates. Instead of recompiling the templates when it restarts, ColdFusion will load templates from disk. There is, however, a tradeoff between I/O operation and compilation time. ColdFusion will take some time to search and read class files from disk, and this must be weighed against the cost in time to compile a template again. Enabling this option may not provide any significant gain in performance.
Because database queries are very time consuming activities, caching queries can increase performance significantly. There are advantages and disadvantages to increasing the maximum number of cached queries. For more details on this topic, see Caching Queries to Disk or to Memory with ColdFusion.
As a rule of thumb:
The Request Tuning page in ColdFusion Administrator (Server Settings > Request Tuning) provides several options that can be used to improve performance.
If the CPU usage of the server on which ColdFusion is installed is low (around 10-30%) then increasing the maximum number of simultaneous template requests can improve performance. If CPU usage is already high then this setting may not help performance.
By changing the server's thread concurrency settings, you can limit or increase the number of requests that a JRun server can processes concurrently. This should result in performance improvements as the number of CPUs increases. The number of
<cfthread> requests in your application should be less than the value specified by the Maximum Number Of Running JRun Threads setting, since threads created by the
<cfthread> tag are mapped to JRun threads. If the number of ColdFusion threads created by
<cfthread> exceeds the maximum number specified by this setting then ColdFusion threads will be queued.
For more information on JRun threads, see Tuning the JRun Thread Pool.
On the Settings page (Server Settings > Settings) there are two settings to consider.
Enabling whitespace management will compress repeating sequences of spaces, tabs, and carriage returns, hence reducing the size of content received from server. Web pages with a significant amount of whitespace will benefit most from this setting.
This option turns off verifying the CFC type when calling methods with CFCs as arguments. This option also disables verifying an object that implements the right interface.
I did not find significant gains in performance by turning this option on.
Client variables let you store user information and preferences between sessions. For better performance, select either Cookies or RDBMS as the client variable storage mechanism.
For more information on how to configure and use client variables see Configuring and using client variables.
On production servers, be sure to disable debugging to maximize performance. Debugging can be disabled on the Debugging & Logging-> Degug Output Settings page of ColdFusion Administrator.
Memory tracking should be turned off whenever it is not required.
In addition to tuning JVM arguments and adjusting ColdFusion Administrator settings, you can apply coding best practices to achieve performance improvements.
You can improve performance by always qualifying your variables with the proper scope. Wherever possible use fully scoped variables. A variable that has a scope prefix will be evaluated quicker than an unscoped variable.
<cfif isdefined("variables.foo")> is better from a performance perspective than
Complex dynamically constructed expressions will negatively affect performance. For example:
<cfset #foo# = "#bar()#">
The code above should be replaced with the much simpler and more efficient:
<cfset foo = bar()>
IIf() and cfif-else
cfelse instead of
cfif construct is significantly faster and more readable.
To evaluate a dynamic variable, where possible use
# instead of
For example, avoid using:
compare() and compareNoCase()
compareNoCase() instead of the
is not operator to compare two items. They are a bit faster.
For example, avoid using:
<cfif x is not "a">
<cfif compareNoCase(x, "a") neq 0>
listFind() instead of the
or operators to compare one item to multiple items. They are much faster.
For example, avoid using:
<cfif x is "a" or x is "b" or x is "c">
<cfif listFindNoCase("a,b,c", x) is not 0>
Querying a database is one of the most time consuming parts of a ColdFusion page. ColdFusion provides query caching as a way to avoid repeatedly querying the database by caching the query recordset. Queries that do not change frequently are the best candidates for caching.
In ColdFusion Administrator you can specify the maximum number of queries that can be cached at a time. There is no limit on the size of a query, so if there are many queries with large recordsets in the cache then it may cause a memory overflow. As a good practice avoid caching queries for a long time.
<cfquery> tag provides a
cachedWithin attribute to specify the time period for which to cache a query. For example, the recordset for the following query will be cached for six minutes.
<cfquery name="GetParks" datasource="cfdocexamples" cachedwithin="#CreateTimeSpan(0, 0, 6, 0)#"> SELECT * FROM Parks </cfquery>
You can use
cfqueryparam to optimize a query. Consider the following query:
SELECT * FROM TABLE_NAME WHERE COLUMN = #variable#
If this query is executed repeatedly with different values for
variable then using an SQL bind variable will be faster. The
cfqueryparam tag creates these bind variables:
SELECT * FROM TABLE_NAME WHERE COLUMN = <cfqueryparam cfsqltype="query" value="#variable#">
This allows the optimizer to compile the query once and reuse it every time the query is executed. It is also more secure since it prevents malicious SQL from being passed into a query via a variable.
By default, the
cfquery tag returns results sets from databases one record at a time. The
blockFactor attribute tells ColdFusion server (which passes this information to the database driver) to retrieve between 1 and 100 records at a time. For queries that typically return result sets larger than a single row, requesting multiple rows in a block can result in a substantial performance gain. Setting this value too high can diminish performance as well, so it is recommended that you tune this number based on the expected average size of the result set.
Note: If you know that less than 100 rows will be returned (for example if you're writing a query that either returns 0 or 1 rows), do not bother adding the
<cfcache> tag in pages with contents that are not updated frequently. This tag tells ColdFusion server to cache the HTML to a temporary file. When ColdFusion gets a request for a cached ColdFusion page, it retrieves the pregenerated HTML page without having to process the ColdFusion page, thus improving performance.
For more information on this tag and its use, see Caching ColdFusion pages that change infrequently.
For ColdFusion pages that contain some dynamic information and some content that changes less frequently, the
<cfcache> tag should not be used. Instead you can use
<cfsavecontent> to cache infrequently changing output in a shared scope variable. There is a tradeoff, however, due to the overhead of locking a shared scope variable. For detailed information on this and examples see Caching parts of ColdFusion pages.
<cfsavecontent> tag can also be used for concatenation, for which it is much faster than
<cfset>. Here is an example using
<cfset result = ""> <cfloop from="1" to="100" step="1" index="i"> <cfset result = result & i> </cfloop>
The code below using
<cfsavecontent> is much better from a performance perspective:
<cfsavecontent variable="result"> <cfloop from="1" to="100" step="1" index="i"> <cfoutput>#i#</cfoutput> </cfloop> </cfsavecontent>
Wherever possible use stored procedures (via
<cfstoredproc>) instead of SQL queries. This will enhance both security and performance. Once a stored procedure is compiled, SQL uses this compiled code, while inline SQL statements are executed as a new query every time.
ResolveUrl=yes only when needed.
ColdFusion 8 provides the Verity search engine for searching in files and database queries. Search performance can be increased if files are indexed in categories and then searches are conducted within specific categories in a collection rather searching in the whole collection.
<cfsearch name="qsearch1" collection="verity_cat_collection" category ="cat_1" criteria="Coldfusion" >
ColdFusion 8 introduces a new CFML-based tag that enables application developers to quickly and easily add powerful multithreading capabilities to server applications. The <
CFThread> tag enables asynchronous processing in CFML, which harnesses the power of today's processors to vastly improve overall user response times where long-running tasks are made up of autonomous processing steps and processed synchronously. The maximum number of threads available for
CFThread can be set from ColdFusion Administrator on the Server Settings > Request Tuning page. If your application has more concurrent
CFThread requests than the number specified in the Maximum Number Of Threads Available For CFTHREAD setting, then increase this limit. The same guideline applies to the Maximum Number Of Running JRun Threads setting.
Use file functions(e.g FileCopy(), FileRead(),FileWrite()) in place of
<cffile> tag for better performance. The performance gain increases as file size increases.
For example, avoid using:
<cffile action="write" file="c:\temp\myfile.txt" output="Some Text">
<cfset FileWrite("c:\temp\myfile.txt","Some Text")>
To show how much performance can be improved by using the tips covered in this article, I used BlogCFC as a case study. The performance gain noted here is specific to the BlogCFC application for a particular system configuration.
I began by installing GCViewer on a machine with ColdFusion server and installing JMeter on a separate machine that will measure performance. For JMeter and GCViewer to work effectively they should be on different physical machines, preferably on different switches.
In JMeter I recorded a test case that navigates across different pages of BlogCFC. (For more information on how to record a test case in JMeter see JMeter proxy step-by-step .)
Before I tuned anything, I obtained an initial profile to determine the current performance level. I used GCViewer for the profiling and JMeter to measure performance. Initially, all the settings for ColdFusion Administrator and the JVM (as specified in jvm.config) are at their default values. To have GCViewer profile garbage collection, I added the following line to jvm.config under "Arguments to VM" and restarted the ColdFusion server:
-Xloggc:C:/log.txt -XX:+PrintGCDetails -verbose:gc
GCViewer now logs profiling data C:\log.txt.
Figure 10 shows the profile for 30 virtual users (from JMeter) running for 15 minutes accessing pages of BlogCFC. The yellow area represents the young generation and the purple area represents the tenured generation. Blue lines represent used heap.
This profiling shows that there are frequent major collections and frequent minor collections.
The initial average response time (ART) and throughput data collected from JMeter is shown in the following table:
As a first tuning step, I turned Trusted Cache on and noted the following improvement in performance:
Next, I increased Xmx (the maximum Java heap size) from 512m to 1024m to reduce the frequency of major collections (and thereby improve performance).
In the profile shown in Figure 11, you can see that the frequency of major collections has decreased.
And the performance did improve:
In the initial profile (see Figure 10) I noticed that the young generation consumes around 35% of memory. This prompted me to increase the Xmn value to allocate more space for the young generation and reduce the frequency of minor collections.
To increase the Xmn setting from its default to 512m, I added the following to jvm.config:
For an application with lower young generation requirements, you might try using a smaller value for Xmn, for example:
Figure 12 shows that the frequency of minor collections decreased, and the performance metrics show that performance improved:
Because this application showed significant young generation usage, I decided to tune the survivor ratio to see if it would provide any performance gain. The results below (measured after this change) show that no improvement was made:
Next I tried another ColdFusion Administrator option, saving client variables in cookies instead of in the Registry (the default). This resulted in further improvement:
Figure 13 shows that there is a 48% increase in performance in total for BlogCFC using the tips covered in this article.
I would like to acknowledge the help of the following people and resources that contributed to this article: