Accessibility
Ray Camden

Raymond Camden

coldfusionjedi.com

Created:
20 September 2004
User Level:
Intermediate
Products:
Coldfusion

The RSS Watch Sample App (Part 2): Improving and Enhancing the Application

Welcome to the second part of my article about the RSSWatcher ColdFusion component. Now that you’ve read Building the RSS Watch Sample Application (Part 1) and marveled at how wonderful it was, it’s time for me to come clean with something. I wrote the article and the code, and, well, like most folks, I didn’t give the code enough of a shakedown (in other words, bug testing).

After the article went live, and the code ran on a production server, I noticed a few errors and things I could do to improve the code. The point of this article then is to take a critical look at the first version and talk about what worked well, what didn’t, and what was just plain silly on my part. All code goes through revisions and I believe anyone can benefit from me pointing out exactly what revisions I made, and why I made them.

Requirements

To complete this tutorial you will need to install the following software and files:

ColdFusion MX

Tutorials and sample files:

Prerequisites: Building the RSS Watch Sample Application (Part 1) before proceeding with this tutorial.

The Original Code

The original RSSWatcher.cfc contained three methods. The first method, getSearches(), parsed an XML file to generate an array of search profiles. Each profile contained a set of search terms and a list of RSS feeds.

The next method, processSearches(), iterates over each of the search profiles, fetches the RSS data, and then searches for the terms from the profiles.

The last method, rssParse(),translates the RSS feed into a simple array of structures that the processSearches() method examines.

A simple file called runner.cfm uses this CFC and called the processSearches() method in the CFC and then mailed the results to myself through e-mail. I scheduled this file to run once an hour.

So What Went Wrong?

I scheduled runner.cfm to run once an hour. I patiently waited for one of my search terms to turn up in the RSS feeds I was monitoring. I didn’t have to wait long. About a half day or so after I started the process, a result arrived. I was ecstatic. I was thrilled. Okay, so it wasn’t that exciting. I knew the code worked as I had performed static testing before, but still, it was cool to see it working “in the wild.”

An hour later I received the same result. The exact same match… again. And then it happened again and again—you get the picture. One obvious little thing had slipped my mind: The RSS feeds typically show the last 15 items on a blog or news source. An item can appear in a feed and stay indefinitely if the CFC adds no new items. My code simply checked the feed. It didn’t check the date of the entry to see if it was a new result. Luckily, most RSS feeds contain the date of the entry. Let’s take a look at the old code and what items it grabbed from the RSS feeds.

Listing 1: Portion of the rssParse method
<cfset items = xmlSearch(xmlData,xPath)>

<cfloop index="x" from="1" to="#arrayLen(items)#">
	<cfset node = structNew()>
	<cfset node.title = items[x].title.xmlText>
	<cfset node.description = items[x].description.xmlText>
	<cfset node.link = items[x].link.xmlText>
	<cfset result[arrayLen(result)+1] = duplicate(node)>
</cfloop>

If you remember, items was an array of XML nodes returned by using an XPath search on the XML data of the RSS feed. You are working with RSS 1.0 and RSS 2.0 feeds, both of which contain (again, welcome to the world of trusting outside sources) dc:date entries. Here is an example:

<dc:date>2004-09-07T11:00:00-07:00</dc:date>

It was easy enough to get the date. All I needed to do was grab items[x][“dc:date”] in the loop statement above. However, the format wasn’t exactly clear. After a quick Google search, I came across the specification. As I thought, everything before the T represented the date in a YYYY-MM-DD format The time was everything after the T. However, note the text after the dash. This represents the UTC offset, or how many hours off ahead or behind UTC. I converted this date and time to local time. Luckily, ColdFusion provides you with a function that tells you the difference between your local machine and UTC. If you know the difference between the RSS node from UTC, then you know the difference between the local machine and UTC—it's somewhat simple to recalculate the time.It was pretty obvious that I needed to encapsulate it. I added the following piece of code to my loop:

<cfif structKeyExists(items[x],"dc:date")>
	<cfset node.date = parseDCDate(items[x]["dc:date"].xmlText)>
<cfelse>
<cfset node.date = "1/1/1">
</cfif>

In this sample, I first check to see if the node actually exists. If it does, I pass the value to a function I describe later, parseDCDate. If the node does not exist, I simply set it to a default date of 1/1/1. Later on, I strip the code that is older then one hour old, so this basically “gives up” on feeds that don’t use the date option. This is probably not best technique, but it prevents the application from having duplicate entries. Take a look at parseDCDate:

Listing 2: parseDCDate method
<cffunction name="parseDCDate" access="private" returnType="string" output="false">
	<cfargument name="dtstring" type="string" required="true">

	<!--- find the date --->		
	<cfset var theDate = listFirst(arguments.dtstring,"T")>
	<!--- find the time and strip out tz --->
	<cfset var theTime = listFirst(listLast(arguments.dtstring,"T"),"+-")>
	<!--- find the offset for the time --->
	<cfset var tzOffset = listLast(arguments.dtstring,"+-")>
	<!--- find if it was pos/neg --->
	<cfset var tzOffsetSign = mid(arguments.dtstring,len(arguments.dtstring)-len(tzOffset),1)>
	<cfset var tz = getTimeZoneInfo()>
	<cfset var adjustedOffset = "">
	<cfset var convertedTime = "">

	<cfset tzOffSet = int(listFirst(tzOffset,":"))>

	<cfif tzOffsetSign is "-">
		<cfset tzOffset = -1 * tzOffset>
	</cfif>
		
	<cfset adjustedOffset = (-1)*tz.utcHourOffset - tzOffset>
	<cfset convertedTime = dateAdd("h",adjustedOffset,dateFormat(theDate) & " " & theTime)>

	<cfreturn convertedTime>
						
</cffunction>

If you remember, the date time string must look like 2004-09-07T11:00:00-07:00. Your method simply uses a combination of list and string functions to parse the time. First, retrieve the date portion using listFirst and the T character as the delimiter. The time value is everything after the T, but before the Plus (+) or Minus (–) sign that signifies the time-zone offset. Retrieve the time zone offset by using listLast. Remember that all list functions allow for multiple delimiters. By using Plus (+) or Minus (–) sign as the list delimiters, it will match either the Plus (+) or Minus (–) sign at the end. Next we use the mid function along with a combination of other checks to figure out exactly what sign (+ or -) was used at the end.

After a few more simple var scope declarations, you begin the real work. First, you need to convert the time zone offset, which is in the form 0X:00, to just a number, X. You do this by using listFirst to strip out the numbers after the colon and int() to remove the zero in front. Once you have this number, you check to see if the offset was negative, and if so, you multiply the time zone offset by -1.

Once that is done, you can figure out the adjusted offset value by subtracting the node’s time zone offset by your server's offset. You can figure out this value by using the struct returned by getTimeZoneInfo(). The utcHourOffset method reflects your server's offset. These values have a positive sign for time zones within UTC, which means to make it match up with the format used in the dc:date column, you need to multiply it by -1. Finally, the adjusted offset is your offset (multiplied by -1) subtracted by the time zone offset of the date you parse.

Once you have the adjusted offset, you simply add the value to the date/time from the blog entry. This should convert the remote blog entry to a local time. That was a lot of work, but since you separated the time into its own method, it will be easy to update in the future. Last but not least, you must use this date. The previous code in the processSearches method simply checked to see if your search strings matched. You now modify this code to see if the CFC generated the node in the past hour.

 Listing 3: portion of processSearches
<!--- check result to see if our term is matched --->
<cfloop index="z" from="1" to="#arrayLen(rssItems)#">
	<cfif dateDiff("h",rssItems[z].date,now()) lte 1 and (
		findNoCase(mySearches[x].terms, rssItems[z].title) or
		findNoCase(mySearches[x].terms, rssItems[z].description))>
		<cfset result[arrayLen(result)+1] = structNew()>
		<cfset result[arrayLen(result)].terms = mySearches[x].terms>
		<cfset result[arrayLen(result)].rss = mySearches[x].rss[y]>
		<cfset result[arrayLen(result)].matchedItem = rssItems[z]>
	</cfif>
</cfloop>

In the code sample above, the only modification was to use the dateDiff function to see if the entry was less than or equal to one hour old.

Other Improvements

Along with my big mistake of not checking the date of entry items, I found a few other places I could improve.

Improving the Caching

One of the things the processSearches method did to improve performance was to cache the result of the CFHTTP call. This is probably the slowest part of the entire process, so this cache is very important. It occurred to me, however, that I could cache the result of the rssParse method. Listing 4 shows both the current caching mechanism and how I used rssParse:

Listing 4: Use of caching with processSearches
<!--- See if we have this URL in cache already --->
<cfif not structKeyExists(variables.httpCache, mySearches[x].rss[y])>
	<cfhttp url="#mySearches[x].rss[y]#">
	<cfset variables.httpCache[mySearches[x].rss[y]] = cfhttp.fileContent>
</cfif>
				
<cfset rssItems = rssParse(variables.httpCache[mySearches[x].rss[y]])>

This code simply checks to see if variables.httpCache already contains the file contents of the URL in question. If not, it updates the cache. The last line in the portion above calls rssParse with the string contained in the httpCache structure. As I stated above, though, you can make this a bit cleaner and even a bit quicker by caching the result from rssParse. Take a look at this in Listing 5.

Listing 5: Modified caching strategy
<!--- See if we have this URL in cache already --->
<cfif not structKeyExists(variables.cache, mySearches[x].rss[y])>
	<cfhttp url="#mySearches[x].rss[y]#">
	<cfset rssItems = rssParse(cfhttp.fileContent)>
	<cfset variables.cache[mySearches[x].rss[y]] = rssItems>
<cfelse>
	<cfset rssItems = variables.cache[mySearches[x].rss[y]]>
</cfif>

First off, I changed the variable, httpCache, to simply cache, to reflect the new nature of the data the CFC stores. As before, I see if the URL exists in the cache. If it does not, I retrieve the contents, pass it to rssParse, and then store the results. If the cache does exist, I simply grab it from the cache. There is no redundant call to rssParse. While this isn’t a big improvement, every little bit helps. There isn’t much I can do speed up the CFHTTP calls, but every little bit helps.

Dealing with the Feeds

As described in the first article, I used the code behind rssWatcher at www.rsswatcher.com. One of the things I did was to have the site send me e-mail whenever a feed failed to parse. I discovered a feed that, for whatever reason, didn’t have a description node. This caused rssParse to log an error and ignore the entry. This was easy enough to fix. I modified these lines:

<cfset node.title = items[x].title.xmlText>
<cfset node.description = items[x].description.xmlText>
<cfset node.link = items[x].link.xmlText>

To the following:

<cfset node.title = items[x].title.xmlText>
<!--- some feeds have no description --->
<cfif structKeyExists(items[x],"description")>
	<cfset node.description = items[x].description.xmlText>
<cfelse>
	<cfset node.description = "">
</cfif>
<cfset node.link = items[x].link.xmlText>

Of course, an entry could still throw an error if the link or title attributes do not exist. I figure these entries probably aren’t worth searching anyway. (As it stands, an entry without a link is not something that a user can view anyways.)

Conclusion

So, there is always the chance that maybe I’m the only developer who doesn’t write perfect code—hopefully I’m not. Sometimes looking over existing code can be an exercise in shame (did I really write that?) and sometimes it can be vital to catch security or other errors that missed the quality assurance process before. Hopefully this article gives you the impetus to take a look at your own code and see where you can improve it as well.

About the author

Raymond Camden is a software consultant focusing on ColdFusion and RIA development. A long time ColdFusion user, Raymond has worked on numerous ColdFusion books including the ColdFusion Web Application Construction Kit and has contributed to the Fusion Authority Quarterly Update and the ColdFusion Developers Journal. He also presents at conferences and contributes to online webzines. He founded many community web sites including CFLib.org, ColdFusionPortal.org, ColdFusionCookbook.org and is the author of open source applications, including the popular BlogCFC (www.blogcfc.com) blogging application.Raymond can be reached at his blog (www.coldfusionjedi.com) or via email at ray@camdenfamily.com. He is the happily married proud father of three kids and is somewhat of a Star Wars nut.