13 March 2009
A variety of technologies and platforms are used to create rich Internet applications (RIAs), including Ajax, Curl, JavaFX, Microsoft Silverlight, and the Adobe Flash Platform. It's no secret that RIAs present challenges for search engines. Understanding the importance of this issue, Adobe has partnered with Google and Yahoo! to get to the heart of the issue and to come up with solutions that can not only work with the Adobe Flash Platform but also provide insight and fundamentals that can influence future technological advances.
In fact, a similar issue arose a few years ago with the PDF format. At the time, search engines had trouble recognizing the content contained in these documents. Now, search engines can crawl and index PDF files rather easily. The point is that eventually this issue will be solved, and Adobe, Google, and Yahoo! are leading the charge.
Last year Adobe announced a major breakthrough with the release of optimized Adobe Flash Player technology (since dubbed "Flash Player for Search Engines"), which is essentially a "headless" version of Flash Player that can change states of SWF content and gain access to the text content residing within. For a quick overview of how Flash Player for Search Engines works, please watch Duane Nickull's video blog post about it.
While Adobe, Google, and Yahoo! continue to collaborate to solve the problem, there are things you can do today to improve the relevance of your SWF content in search results. This article explains what the issues and challenges are and how to overcome them, whether you are a developer, designer, content owner, website owner, project manager, or even an SEO expert.
For example, suppose you've got a site built using Adobe Flash technology. You've received rave reviews on the slick design, smooth animation sequences, and overall powerful user experience. It does a great job of establishing a unique brand experience, it was developed for easy updates, and it is just flat-out cool. Schedule, budget, and content challenges were overcome and now everyone is basking in the glory. This article will help you answer one of the most important questions, and one that few people ask at the onset of a project: "Oh, by the way, does the site work with search engines?" Sound familiar?
This question usually arises only when the work is completed and your client begins searching for terms related to their brand, only to find that the site ranks much lower than expected. Most of the time, creating a search-friendly site was not part of the scope, nor was it considered or discussed when establishing the information architecture, design, and functional specifications.
Considering all the different criteria that need to be considered and prioritized when developing RIAs—client expectations, design decisions, content requirements, functionality decisions—it is very important in the planning stages to outline exactly how your RIA will comply with search. You will benefit in the long run if you spend the time up front to create a search-friendly RIA. The goal of this article is not to add another discipline to juggle, but to provide you with search best practices that can be integrated into the development cycle. Yes, it can be done.
To make best use of this article, you should be familiar with Adobe Flash technology and rich Internet applications, and have an advanced understanding of web development techniques.
It's important to be familiar with the challenges that search engines encounter when they come across an RIA. What it really boils down to is the difference between static and dynamic content. HTML has a beginning, a middle, and an end. Most RIAs have a beginning, lots of different middles, and are not required to have an end. The biggest challenge that search engines face with SWF content is the dynamic data that loads and is displayed based on the interactivity. This happens at runtime, not when the browser initially loads the HTML and SWF content—which is what the search engine crawls.
For example, suppose you visit a website that uses a SWF application as its primary interface. What you see when the browser loads the HTML page with the embedded SWF is essentially what the search engine sees—whatever is on the HTML source. The search engine crawls that content, then moves on to a different URL to crawl. It doesn't have the ability to interact with the SWF content, nor is it going to wait for a user to interact with the SWF content. It does its job and moves on.
Suppose further that within this SWF application, you then click on a button to see some additional information. At this point, the SWF application is requesting content behind the scenes from an external data source and displaying the content based on the button that you clicked. Unfortunately, the problem is that the search engine never sees this new, dynamic content because it is based on human interaction. Search engine spiders have not historically been able to change the states of RIAs on their own; they would just grab whatever text was contained within the SWF application, which did not contain any structure. The same issue occurs with Ajax and other dynamic web application platforms. Any dynamic data being passed back and forth is invisible to search engines.
This is exactly what Adobe Flash Player for Search Engines addresses: providing the search engine with visibility into the SWF application, how it changes with user interactivity, and what dynamic content is being displayed.
Having unique URLs for the important content areas of your SWF application is critical. If I wanted to tell someone to go to a SWF-based website and look at a specific piece of content, more often than not I would have to navigate them to the eventual destination instead of just sending them a URL to click and visit the page. For example:
The search engine has the same problem: it cannot get to this individual page. If the search engine can't get to the page on its own, it cannot fetch the information on it to score and publish it to the search results page.
To expand on the URL issue, imagine the following scenario. Let's say that your SWF application happens to rank in the search engine for a branded term (a term that no other website would rank for). The term on which the user searches matches the fifth interaction in the SWF application (the state). When the user clicks over to that result (URL), the browser will load the SWF application from its initial state and the content that matches the user's search won't be immediately seen. The user will have to interact with the application (click multiple times) until that content is presented. Users may instead feel frustrated and abandon the page. This is similar to having a very long HTML page, or having some DHTML that displays the content relevant to the search query. Instead of the user immediately seeing the content they explicitly asked for, they have to work for it. Even worse, users often won't know that they need to do this; they simply assume that the search result is in error.
In addition, from a social media standpoint, it is important that your site contains links that can be easily passed on to others. For example, suppose that a user sees a product on your site that she want for Christmas and posts the link to her Facebook page. When her friend clicks the link, he should immediately see the web page and product without having to click again. This is a nice usability feature, but also helps your site build up what is referred to as link equity: the measure of the number and quality of links pointing to your site. Search engines use links (in addition to content) to determine your site's search rank. Think of each link to your site as loose change in your pocket. The more you have, the more valuable your site. It's important to note that quality is more important than quantity. For example, I can write a blog post about my techniques for dealing with a cold. However, if WebMD writes an article on the same subject, it is much more authoritative. Getting links from the WebMD article is much more valuble, and will help boost your search rank.
By using the techniques outlined in this article, you will improve your chances of getting your SWF application indexed by search engines. But that is only the beginning. Building up link equity by having authoritative domains linking to your SWF application will help your site compete for the top search rank.
Now that I've outlined some of the issues, here are some techniques you can use with your SWF application so that search engines can recognize, index, and rank your content better.
Establish some search-related goals in the planning stages. A few examples of a search-related goal would be to improve ranking positions for certain terms, to establish a top-five rank for some business-critical terms, or to drive x% more traffic from search. Your traffic goals for your site should factor in search as a primary contributor. This will help you establish important goals that will drive technical and content decisions as you begin to build out your SWF application.
While the Flash Player for Search Engines initiative is working to change this, as of this article's publication, in order for search engines to crawl and index important content that is contained within a SWF file or loaded by a SWF application from an external source, the content must be placed in the HTML source. It's as simple as that. There's no way around it. So, the first step in planning your site is to make sure that the content will be included in the HTML source.
This does not mean you have to create HTML pages for every section of your application. Odds are good that no one will search for the content appearing in a "Contact Us" section. Searchers are looking for specific information related to products and services. Therefore, identify the important content sections of your application (which pages you want to rank well in the search engines) and begin the process of extracting the SWF content and getting it into HTML, which the search engine can then crawl and index.
There are several methods for doing this. I have seen
<div> tags used which will get the content to appear in the HTML source (SWFObject 2 does this automatically). This method can work, although some in the search industry would argue that hiding the content (through the
<div>) is an SEO bad practice.
I've seen another method implemented with good success: the
<noscript> <!-- alternative SWF content --> </noscript>
The benefit is that since this content is in the HTML source, the search engine can crawl this content and get what it needs to determine what the page is about. This is accomplished by including the appropriate keywords in the title, heading, body copy, and link text. Keep in mind that some users will see content that is not SWF-based, so it's a good idea to reuse the design elements you used in your SWF content in the HTML. You should be able to do this with some simple CSS.
Procter & Gamble's PUR Water Filtration System website is one really good example of this method being used with success (see Figure 1).
To see the
<noscript> tag, they are using the same content that was in the SWF, just formatted in HTML. The search engine is able to crawl the content and, thus, determine what each of these pages is about.
As you can see, the HTML version has exactly the same content seen in the SWF version. This is the optimal method of presenting the rich content to search engines—and many top brand names like Disney and P&G are using it because it is the optimal blend of search optimization and user experience for those who can see the SWF version.
At first, this appears to be way too much work, requiring time and budget that is not available. Although you're right that it will require some additional design and QA time, there is a method you can use to save time and create some efficiencies for use in other projects. Consider using a single XML source to control the content in both the HTML and SWF file. By using Extensible Stylesheet Language (XSL), you can alter the format of XML data either into HTML or other formats that are suitable for a browser to display. This leads to easier maintenance and ensures accuracy of content. Now when you have text edits or other content updates, you can do it once (in the XML) and see the results in both formats.
One more important element in getting your SWF application to be recognized by search engines is to make sure that your main navigation is in the HTML code. Search engines use the main navigation on websites to follow the links within the site, which helps them gain a better understanding of the site's content. As I mentioned in the beginning of this article, Flash Player for Search Engines helps Google and Yahoo! crawl SWF content. Until more progress with this solution is made in the near future, it's best to build out an HTML version of your primary navigation.
Getting the important SWF content into the HTML source is only half the problem. It doesn't address the URL and directory structure issues. For example, with most SWF applications, there is only one URL for the entire site. This single URL contains the same HTML source (since the browser never refreshes), which contains the same title, H1, and body copy. In order to be compliant with search best practices, you need each URL to have a unique title, H1, and body copy—each of which would contain the keyword for which you want to rank.
As you know, a SWF file embedded in a web page does not require the browser to refresh in order for the user to interact with the content. For example, if you click three or four links within the "product" section of your SWF application to get to a product detail page, the browser's URL never changes for any of those clicks. The same HTML source is used (and thus seen by the search engine) for all of those different sections: product category, product overview, product detail, purchase, and so on. You might think that the solution would be to use a simple
<div> technique to show the engine the same text and other content that appears in the SWF application for all of these different sections. The challenge with this method is that if you simply list all the product information for all the items within this section in the one HTML source (since there's only one URL), the engine will see the content but there will be no way for the engine to distinguish what the page is about because all of the information is competing with itself. Because there are too many categories of information, there's no way to establish any prominence.
Some websites do this on the home page. They have an HTML landing page with a "Launch Flash site" button that starts the SWF content. On that HTML landing page, they will put the entire site's contents in the alternative SWF content in the HTML source, thinking that they are now search-friendly. While the search engine can crawl that content, it's really just jibberish because there is no way to establish prominence on a single topic. There are too many competing terms.
Important sections of the site can be determined by your search goals. You may establish that there are 15 business-critical terms for which you must rank. In that case, you would need 15 unique URLs. This is because it is important to have a 1:1 correlation between a keyword and what is termed a Preferred Landing Page (PLP)—the URL that you want users to click once they've searched on the keyword. For example, if your site is about shoes, when someone searches on "basketball shoes," you would want them to land on the page dedicated to basketball shoes, not the home page for all shoes. This is important because it provides a nice user experience by eliminating the user's need to click around on your site manually to find the content in question. Think of it as your audience raising their hands and requesting some specific information, and your site provides that specific information. The likelihood for your user staying on the site instead of returning to the search engine and clicking another link has increased dramatically. Don't make your audience work to see the content they explicitly ask for.
Recognizing this issue, Adobe has made some good progress with Adobe Flex 3, which allows you to enable deep-linking within your SWF application. This is a nice feature from a usability and social media standpoint. However, the same challenge exists with using deep-linking techniques with Flex 3 or SWFAddress: there is no source HTML content to crawl, since both of these techniques do not require a browser refresh.
When establishing your site map and wireframes, therefore, make sure to plan for this. Start with the directory/URL planning. Establish a 1:1 correlation between a PLP and a keyword and what the SWF content requirements will be. For example:
Now that you've got a solid directory structure planned out, decide how you want to build your SWF content. I don't recommend creating a SWF file for each unique URL because the maintenance would be a nightmare. Instead, embed the same SWF file each time, but pass a variable into the SWF to tell it what state to present. This is best accomplished by using FlashVars, which is also available with SWFObject.
Object tag example:
<PARAM NAME=FlashVars VALUE="state=1">
Embed tag example:
<EMBED src="display.swf" FlashVars="state=1"></EMBED>
This should be set up by writing the FlashVars with PHP, Adobe ColdFusion, or ASP if possible. You can keep your unique URLs intact without having to build a new SWF for each URL. The same SWF would be embedded in the following URLs:
Odds are good that you are authoring animation content for your website using Adobe Flash Professional or Adobe Flex to display some animation on your site. This may be in the form of a banner-style header on the home page or some type of interactive feature that lets you click some different options that display some animated content. You may have some transition effects, such as text animating on the screen and images with descriptions fading in and out, or even changes based on some type of interaction. Whatever the scenario, there is obviously a SWF file embedded in the HTML source that contains this animated content. So make sure that the content that is contained in your animations is also included in the HTML source.
Many times, especially with banner-style headers or large home page animations, very important marketing copy resides in or is loaded by the SWF file. Make sure to use actual text when creating these animations. A common mistake is to use an image or a flattened symbol in your SWF application to display text—or even creating text with ActionScript. Since Flash Player for Search Engines can read the contents within your SWF application, give it some real text to read and ensure that the same text is available in the HTML source.
Identify the important content, links, or image descriptions contained within your SWF animation. Once you've decided this, you can begin the process of extracting the SWF content and getting it into HTML via the
<noscript> method mentioned earlier in this article.
Many design or animation decisions can have a positive or negative effect on how search engines index your content. For the important PLPs (URLs), make sure not to over-design the animation and transition effects between sections and pages. For example, on a product-based website, upon selecting a product by clicking a link, users will go to another page. Since the browser will have to reload a new page, don't design elaborate transition effects (animations, fade in and out, or other such effects) between pages unless you want to take the time to build those in a way that is seamless to the user. Make sure the animation or other functionality is critical to the user experience. Save animations for unique user features, such as zooming in or out on a product, or for those other pages for which you don't need to rank in the search engines.
Keep in mind that the intent here is not to eliminate all the cool features of an RIA. As I mentioned earlier, you may decide that you need 15 unique URLs to map to the 15 business-critical keywords that you have selected. If your site has 45 total sections, you can use the other 30 sections for displaying the animations, transition effects, and other unique, rich interactive elements of your site.
To comply with search best practices, you want each important URL to have a unique title, H1, and body copy—each of which would contain a keyword for which you want to rank in search results. This is especially important for SWF applications that enable a user to browse images or view videos.
With an image viewer or video player, you may have hundreds of thumbnail images that a user can click to see an expanded view of the image or video, along with supporting information such as links and descriptive text. The challenge is that if you have this one SWF file available on only one HTML page (and therefore one URL), the search engine sees only that single HTML source. Even if you did use the
<noscript> method mentioned earlier, there would be too much competing information on the HTML source and the search engine would not be able to establish any prominence (one topic the page is about). Again, it is important to establish a 1:1 correlation between a keyword and Preferred Landing Page.
Without unique URLs for these assets, you are missing out on an opportunity to take advantage of "universal search." For example, have you ever noticed all the different results that are available today when you search? Websites, videos, images, and shopping results are all likely to show up when you search. This is what universal search refers to: all the different types of content that could match a search query. So be sure to structure your URLs in a manner that gives each piece of important content an opportunity to rank.
Here are some additional steps you can take to ensure that search engines index your SWF application well.
A common technique with websites is to launch a pop-up window containing the SWF content. For example, I go to mywebsite.com, the browser runs its usual detection scripts, and then a pop-up window emerges containing the SWF content. Alternatively, many sites have an HTML landing page with a "Launch website" link, which launches a pop-up window with the SWF content when it's clicked.
You've all heard the issues with pop-up windows. They're annoying, they symbolize advertisements, there are many "pop-up blockers" out there, and so on. To add to this list, the search spiders cannot see them. Make sure to structure your SWF content in a way that doesn't rely on pop-up windows.
Since you now have some unique URLs with some HTML content that the search engine can crawl, create an XML sitemap, which lists all the website pages you want the search engine to crawl. It's a simple XML file that you post on the root directory of your site (see Figure 3). As you can see, the
<loc> node is your unique URL, the
<lastmod> node is that last time the content changed on this URL, the
<changefreq> node tells the search engine how often the content is updated (which influences how often the engine returns to crawl the content), and the
<priority> node is a numeric value establishing the importance of each URL.
Below is a sample XML sitemap:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2009-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> <url> <loc>http://www.example2.com/myexample</loc> <lastmod>2009-02-01</lastmod> <changefreq>weekly</changefreq> <priority>0.3</priority> </url> </urlset>
All you need to set this up is a Google Webmaster Tools account and the list of unique URLs. Read the About Sitemaps page on Google Webmasters Help for more information on XML sitemaps, guidelines, instructions, and submission.
An HTML sitemap is a page on your site that contains links to all the sections of your site. The goal here is to create additional links into the unique URLs that you've established, because links play an important role in determining your site's rank. You've probably seen hundreds of these before, but in case you haven't, Apple's sitemap is a good example of a format to follow.
This is handy for websites that use a lot of video assets. Very similar to the XML sitemap, the video sitemap is a simple XML file that allows you to not only specify the URL, but also the location to a thumbnail "preview" image of the video (see the
<video:thumbnail_loc> node in Figure 4).
Below is a sample video sitemap:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> xmlns="http://www.google.com/schemas/sitemap-video/1.0"> <url> <loc>http://www.example.com/videos/some_video_page.html</loc> <video:video> <video:content_loc>http://www.example.com/video123.flv</video:content_loc> <video:player_loc allow_embed="yes">http://www.example.com/videoplayer.swf?video=123</video:player_loc> <video:title>My Video</video:title> <video:thumbnail_loc>http://www.example.com/thumbnails/123.jpg</video:thumbnail_loc> </video:video> </url> <url> <loc>http://www.example.com/videos/some_other_video_page.html</loc> <video:video> <video:content_loc>http://www.example.com/video1.mpg</video:content_loc> <video:description>My Amazing Video</video:description> </video:video> </url> </urlset>
All you need to set this up is a Google Webmaster Tools account and the list of unique URLs. See the Creating Video Sitemaps page on Google Webmasters Help for more information on video sitemaps, guidelines, instructions, and submission.
A robots.txt file is a simple way to control what content you want the search engine to crawl. It's a text file that is placed in the root directory of a web server. Your robots.txt file tells the spider what files it is allowed to look at on that server. There are only two operative statements in this file:
user-agentstatement defines the search spiders to which the next
disallowstatement applies. If you code an asterisk for the user agent, you are referring to all spiders.
disallowstatement specifies which files the spider is not permitted to crawl. You can specify a precise filename or any part of a name or directory name. For example, specifying "e" eliminates all files starting with "e" from the crawl, as well as all files in any directory that begins with "e". Specifying "/" disallows all files.
For example, the following robots.txt code blocks all spiders from examining the cgi-bin and java directories; it also stops "roguespider" from crawling any file:
User-agent: * Disallow: /cgi-bin Disallow: /java User-agent: roguespider Disallow: /
Visit the Creating a robots.txt file page on Google Webmasters Help for additional documentation.
To summarize, here are the steps you need to take to optimize your SWF applications for search engines:
Check out these resources to dig deeper into the topics and issues discussed in this article: