Accessibility

Introduction to Regular Expressions in Dreamweaver

Rob Christensen

Adobe

Regular expressions are patterns that describe character combinations in text. Dreamweaver support for regular expressions empowers web developers by enabling them to find and quickly replace content with surgical-like precision.

In some cases, a web developer may want to create a regular expression that updates content such as changing copyright information throughout a site. Another example might be the case where a web developer wants to search quickly for declarations of a variable.

In this article, I will provide a basic overview of regular expressions as well as an example of how they can be used to help maintain Cascading Style Sheet (CSS) code on your site.

To further understand the intricacies of this pattern-matching language, I encourage you to refer to additional resources on regular expressions that explore the subject in greater detail.

Requirements

To complete this tutorial you will need to install the following software and files:

Dreamweaver MX 2004

Background on Regular Expressions

The origins of regular expressions date back to the 1950s when mathematicians began exploring theoretical computer science. This field of research included topics such as automata theory and formal language theory. Stephen Cole Kleene, one of the fathers of theoretical computer science, is credited with inventing regular expressions. Ken Thompson, a major contributor to the development of the UNIX operating system, incorporated regular expressions into the UNIX text editor known as Ed.

Today, support for regular expressions can be found in scripting languages, programming languages, operating systems, and tools. Examples of tools that support regular expressions include Microsoft Internet Explorer, Firefox, Macromedia Flash, Eclipse, and Microsoft Visual Studio .NET. Nearly every modern programming or scripting language offers built-in support for regular expressions or provides a dedicated library as an add-on. Some examples of computer languages that support regular expressions include Macromedia ColdFusion, PHP, ActionScript, JavaScript, Java, C++, C#, Visual Basic .NET, PERL, Ruby, and Python.

Depending on the tool or language, the implementation of regular expressions may vary. Due to the varying levels of support, I recommend that you always find documentation on what is and is not supported by the tool or language that you're using. There are a variety of reasons why the growth in popularity of regular expressions has taken place. For web developers and programmers, the quantity of code in the world has grown exponentially. As such, it's become increasingly difficult to navigate the volume of text that is out there. In addition, although regular expressions ancestry stems from mathematics followed by programming, regular expressions can be used by content developers as well to update large amounts of text easily.

Common Usages

Regular expressions are sometimes referred to as "RegEx" or "RegExp." There are a number of common usages for regular expressions including:

Regular Expressions 101

Before I explore some real-world examples, I'd like to review some of the basic regular expressions principles and syntax to help form a strong foundation for later discussions. Please note that it's not important that you remember the entire contents of this section, but rather that you read through it and return to it as a reference.

Literal Expressions

In the land of regular expressions, all digitals and alphabetic characters match themselves. These are referred to as literal expressions, simple expressions, or simple sequences. In other words, searching using a regular expression that only contains alphanumeric characters produces the results in a normal, non-regular expression.

Regular Expression Matches
San San Serif
Santa Rosa
Artisan

Matching using literal expressions has severe limitations. For example, imagine that you want to find only cases of the term "San" in which "San" occurs in either the first three characters of a line or the last three characters of a line. With literal expressions alone, this would not be possible. Enter regular expression special characters.

Special Characters

Special characters, sometimes referred to as metacharacters, are reserved, non-alphanumeric characters that provide special types of functionality. There are approximately 11 of these special characters that are summarized in the following table along with examples.

Character Matches Example
^ Beginning of input or line ^T matches "T" in "This good earth" but not in "Uncle Tom's Cabin"
$ End of input or line h$ matches "h" in "teach" but not in "teacher"
* The preceding character 0 or more times um* matches "um" in "rum", "umm" in "yummy", and "u" in "huge"
+ The preceding character 1 or more times um+ matches "um" in "rum" and "umm" in "yummy" but nothing in "huge"
? The preceding character at most once (that is, indicates that the preceding character is optional) st?on matches "son" in "Johnson" and "ston" in "Johnston" but nothing in "Appleton" or "tension"
. Any single character except newline .an matches "ran" and "can" in the phrase "bran muffins can be tasty"
X|y Either x or y FF0000|0000FF matches "FF0000" in bgcolor="#FF0000" and "0000FF’" in font color="#0000FF"
{n} Exactly n occurrences of the preceding character o{2} matches "oo" in "loom" and the first two o's in "mooooo" but nothing in "money"
{n,m} At least n, and at most m, occurrences of the preceding character F{2,4} matches "FF" in "#FF0000" and the first four F’s in #FFFFFF

Note: The above table was originally published in the Dreamweaver product help (Help > Using Dreamweaver).

Or Statement

A vertical bar (also known as a pipe character) is used to indicate that either the pattern before or after matches. An example would be:

style|class

This expression would match either the word "style" or "class."

Repetition

In the table above, special characters help specify how often a character is allowed to repeat. These are the *, +, and ? characters. The * character indicates that the preceding character occurs zero or more times. The + character, similarly, will match one or more instances of the preceding character. The ? character will match the previous character one or more times─that is, the preceding character is optional.

Character Classes

Character classes provide you with a way to restrict the characters you are searching for to a certain set by wrapping those characters in brackets.

Character Matches Example
[abc] Any one of the characters enclosed in the brackets. Specify a range of characters with a hyphen (for example, [a-f] is equivalent to [abcdef]). [e-g] matches "e" in "bed", "f" in "folly", and "g" in "guard"
[^abc] Any character not enclosed in the brackets. Specify a range of characters with a hyphen (for example, [^a-f] is equivalent to [^abcdef]). [^aeiou] initially matches "r" in "orange", "b" in "book", and "k" in "eek!"
\b A word boundary (such as a space or carriage return). \bb matches "b" in "book" but nothing in "goober" or "snob"
\B Anything other than a word boundary. \Bb matches "b" in "goober" but nothing in "book"
\d Any digit character. Equivalent to [0-9]. \d matches "3" in "C3PO" and "2" in "apartment 2G"
\D Any nondigit character. Equivalent to [^0-9]. \D matches "S" in "900S" and "Q" in "Q45"
\w Any alphanumeric character, including underscore. Equivalent to [A-Za-z0-9_]. b\w* matches "barking" in "the barking dog" and both "big" and "black" in "the big black dog"
\W Any non-alphanumeric character. Equivalent to [^A-Za-z0-9_]. \W matches "&" in "Jake&Mattie" and "%" in "100%"
\s Any single white-space character, including space, tab, form feed, or line feed. \sbook matches "book" in "blue book" but nothing in "notebook"
\S Any single non–white-space character. \Sbook matches "book" in "notebook" but nothing in "blue book"
\f A form feed character. --
\n A line feed character. --
\r A carriage return character.

Note: Dreamweaver MX 2004 contains a bug in its regular expression engine where carriage return characters (\r) are not recognized when you click the Find Next button. However, clicking the Find All button does reveal these characters.

\t A tab character. --

By combining repetition, special characters, existing character classes as well as defining new custom classes, web developers can create complex expressions that can be shared with friends and colleagues.

For example, imagine that in an HTML page you wanted to strip out any extra space that trails a <br/>. You notice that in some cases in your code, the line break tag appears with a space character after the "br" like so: <br />. In other cases, you notice that a co-worker has actually two or three extra space characters or maybe even a tab character.

Without regular expressions, for each possible combination, you would need to specify a string of text. By leveraging the power of regular expressions, you can create a single pattern to match all of these cases. An example would be:

Character Matches
<br[\s/]* Three string literals, a less-than sign followed by "br" followed by zero or more instances of white-space characters, including spaces, tabs, form feeds, or line feeds

Special characters when combined with literal expressions and other special characters provide infinite options for web developers to construct regular expressions.

Enabling Regular Expression Searches

Enabling regular expressions searches in Dreamweaver is very straightforward.

  1. Open the Find and Replace dialog box by selecting Edit > Find and Replace.
  2. Select the Use regular expression check box. When selected, Dreamweaver will perform searches using regular expressions syntax (see Figure 1)

    The 'Use regular expression' checkbox option in the Find and Replace dialog box

    Figure 1: The Use regular expression check box in the Find and Replace dialog box

Exercise: Your First Regular Expression

Now you will create your very first regular expression. This expression will search for every instance of a tag in a new HTML document and demonstrate some of the concepts mentioned earlier such as character classes, special characters, and repetition.

Initial Setup

Before you proceed with creating your first regular expression, make sure that Dreamweaver is in the correct state do so.

  1. Open the New Document dialog box by selecting File > New (see Figure 2).

    The New Document dialog box

    (+) View larger

    Figure 2: The New Document dialog box

  2. In the New Document dialog box, select Basic Page for Category and then select HTML as the Basic page type.
  3. Click the Create button.
  4. Switch to Code view by selecting View > Code.

    In Code view, you should now see the following HTML code:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd">
    <html>
    <head>
    <title>Untitled Document</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    </head>
    
    <body>
    </body>
    </html>
  5. Set the cursor to be at the first line of the document.
  6. Open the Find and Replace dialog box by selecting Edit > Find and Replace.
  7. Select the Use regular expressions check box to enable regular expression searches.
  8. Select Current Document in the Find In pop-up menu.
  9. Select Source Code in the Search pop-up menu.

    Your Find and Replace dialog box should look like Figure 3.

    The Find and Replace dialog box with the Current Document, Source Code, and Use regular expression settings applied

    Figure 3: The Find and Replace dialog box with the Current Document, Source Code, and Use regular expression settings applied

Step 1: Creating a Basic Alpha Search

In my experience, regular expressions are often easiest to create using an iterative approach that begins with a very basic pattern that is progressively refined by testing. I tend to use this approach initially, because it is mentally easier to debug and test a simplified regular expression than a complicated one.

As a reminder, the task at hand is to create a regular expression that matches all tags in a new HTML document created in Dreamweaver.

Taking into consideration my previous advice, you'll begin by handling the most basic case—successfully matching a case of a single tag. In this case, you'll search for the <html> tag.

  1. In the Find and Replace dialog box, type <html> in the Find text box (see Figure 4).

     The Find and Replace dialog box with <html> typed in the Find: text box.

    Figure 4: The Find and Replace dialog box with <html> typed in the Find text box

  2. Click the Find Next button.
  3. Verify that in the Find and Replace dialog box only one instance of the pattern was found and that <html> is selected in Code view. If you click Find Next again, the <html> remains selected, confirming that only one instance was found (see Figure 5).

    The Find and Replace text box showing that only one instance of <html> was found in the current document

    Figure 5. The Find and Replace text box showing that only one instance of <html> was found in the current document

Step 2: Using Character Classes to Find All Tags

In Step 1, you created a very strict regular expression that finds only one instance of a tag. In Step 2, you'll define a character class that demonstrates the true power of regular expressions.

Rather than searching only for the <html> tag, you will modify your regular expression to match any instance of any HTML tag in the document. In order to do this, you'll define a flexible custom character class and use repetition to allow for one or more instances of the class to be found.

Before you define your custom character class, you'll consider first the rules around how HTML tags are named. Here are several rules for how HTML tags are defined:

Given these rules, you will begin refining your original regular expression <html>. You'll substitute the "html" fragment first in favor of a more flexible, custom character class that adheres to the aforementioned rules:

<[A-Za-z]

This character class reads as "match a less-than sign followed by any one of the range of characters listed inside of the brackets."

Custom classes enable you to specify a range of acceptable characters using a hyphen. It's important to note also that custom classes are case sensitive. In this case, the Match case check box in the Dreamweaver Find and Replace dialog box is not selected, so this is not a case-sensitive search.

The acceptable range of acceptable characters reads as any character from A through Z (uppercase) to a through z (lowercase). I have intentionally stripped off the trailing greater-than sign in the expression knowing that no matches will be found. The reason for this is that you have not yet defined a repetition, so the expression would have only matched tags that have a single alphabetic character such as <b>.

Now, test your regular expression in Dreamweaver.

  1. Set the cursor to be at the first line of the document in Code view. Doing so will make sure that Dreamweaver begins searching at the beginning of the document.
  2. In the Find and Replace dialog box, type <[A-Za-z] in the Find text box.
  3. Select the Match case check box (see Figure 6).

    The Find and Replace text box with the <[A-Za-z] class in the "Find:" text box and the "Match case" option selected.

    Figure 6: The Find and Replace text box with the <[A-Za-z] class in the Find text box and the Match case check box selected.

  4. Click the Find Next button.
  5. Verify in Code view that <h in <html> is selected (see Figure 7).

    Code view showing &quot;&lt;h&quot; in the &lt;html&gt; tag selected

    Figure 7: Code view showing <h in the <html> tag selected

  6. Click the Find Next button again and you'll see <h in the first part of <head> is selected (see Figure 8).

    ode view showing &quot;&lt;h&quot; in the &lt;head&gt; tag selected

    Figure 8: Code view showing <h in the <head> tag selected

  7. Click the Find Next button again. Notice that this time that <t is selected in <title> indicating that our pattern is matching a variety of different alphabetic characters after the less-than sign (see Figure 9).

    Code view showing &quot;&lt;t&quot; of the &lt;title&gt; tag selected

    Figure 9: Code view showing <t of the <title> tag selected

Step 3: Using Repetition

Although you have defined a character class consisting of all uppercase and lowercase alphabetic characters, your regular pattern is only matching up to the first alphabetic character following the less-than sign.

Since HTML tag names can contain one or more alphabetic characters, you'll introduce the + repetition character by inserting it after your custom character class. You will also add a close greater-than sign to match the end of the HTML tag.

<[A-Za-z]+> 

The regular expression now reads as "match a less-than sign followed by any one or more of the range of characters listed inside of brackets until a greater-than character is found."

  1. Set the cursor to be at the first line of the document in Code view.
  2. In the Find and Replace dialog box, type <[A-Za-z]+> in the Find text box.
  3. Select the Match case check box (see Figure 10).

    The Find and Replace dialog box with <[A-Za-z]+> in the Find: text box and the Match case option selected

    Figure 10: The Find and Replace dialog box with <[A-Za-z]+> in the Find text box and the Match case check box selected

  4. Click the Find Next button.
  5. Verify in Code view that <html> is selected. The regular expression is now successfully matching one or more alphabetic characters until it matches a greater-than sign and matching the entire tag.

    Code view showing &lt;html&gt; selected

    Figure 11: Code view showing <html> selected

  6. Click the Find Next button and notice that three more opening HTML tags are selected: <head>, <title>, and <body>. The only tag not selected is the <meta> tag and that's because it's the only tag that also includes a series of attributes and values. You'll need to make some changes in order to allow for attributes and values.

Matching Closing Tags Using an Optional Character

So far, you've succeeded in creating a regular expression that matches all beginning HTML tags (except for the <meta> tag), but not the closing tags. In HTML, a tag is closed by creating a second, matching instance of the tag with one slight difference: the beginning less-than sign is followed by a forward slash. For example, the beginning tag is <body> and the closing tag is </body>.

As I mentioned earlier in this article, an optional character is represented by the question mark. You will now update your regular expression to allow for an optional forward slash so that it matches either an opening HTML tag or a closing HTML tag.

</?[A-Za-z]+>

The regular expression now reads as "match a less-than sign followed by an option forward slash and then by any one or more of the range of characters listed inside of the brackets until a greater-than character is found."

As before, try searching on this extension and you'll see that both beginning and ending tags are found. You're making progress, but you're still not matching the <meta> tag since the regular expression is not flexible enough to match attributes and values.

Using the Not Operator to Match Tag Attributes and Values

Recall that an open bracket followed by a ^ means to match any character not enclosed in the brackets.

When you think about what characters might follow a tag name to form attributes and values, you realize that there are quite a few. For example, the <meta> tag in the example looks like this:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

This line of code contains numbers, alphabetic characters, quotation marks, space characters, equal signs, and semicolons. You have two options when trying to solve the problem of matching all of these possibilities.

Option 1: A Strict Regular Expression

You could create a strict regular expression that accepts all possible regular expression characters within a custom character class, as shown in the following example:

</?[A-Za-z]+[A-Za-z0-9\s-=";/0]*>

This regular expression continues to read as before but now I've inserted a new optional custom character class for the pattern: [A-Za-z0-9\s-=";/0]*>. This means that following the less-than sign and a tag name (<meta), you'll allow for an optional set of characters to appear. The * following this custom class indicates that zero or more instances of the custom class can occur allowing the pattern to match both a tag with attributes and a tag without.

Option 2: A Lazy Regular Expression Using the Not Operator

A simpler approach would be to take advantage of the not operator in order to allow for any and all characters except a greater-than sign, as shown in this example:

</?[A-Za-z]+[^>]*>

Again, you're defining an optional custom character class. In this case, however, the ^ character means "to match any character except for the ones listed in this class"─in this example, a closing greater-than sign.

Option 2 has advantages and disadvantages. Option 1 is useful when it's paramount that the expression be as precise as possible. However, in this case, you might not want to specify every possible character that is acceptable, because attributes can contain other values such as a $ or a # character. Option 2, in this case, might be more flexible.

Exercise: Using Subexpressions to Replace Content

Regular expressions are often used for performing document-wide find-and-replace actions. In some cases, skilled regular expression authors bravely use regular expressions to make changes across dozens or even hundreds of documents.

In this example, I will show you how to use regular expressions to replace a piece of content.

Imagine a scenario where you've been asked to update an older web page and replace a specific string of text that uses <font> and a messaging "Hello world." You've been asked to:

  1. Replace the <font> tags with the more modern, compliance-friendly <span> tag.
  2. Substitute the word "world" and the closing period with your own name followed by an exclamation mark.

Imagine that the document you are presented looks something like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body>
<font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font>
<font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font>
<font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font>
</body>
</html>

There are not one but three instances of the font tag and text in question.

In order to accomplish this task, I'll need to introduce a new concept: subexpressions.

Introducing Subexpressions

Regular expressions enable you to define subexpressions in order to refer later to certain fragments of your pattern. Subexpressions are defined using parentheses. A basic example of a subexpression would be a regular expression that looks like this:

(a)(b)(c)

This regular expression defines three subexpressions. The first expression pattern (a) can be referred to as the variable $1, the second (b) as $2, and the third (c) as $3. References to subexpressions are created sequentially beginning with 1.

Applying Subexpressions to the Problem

Now that I've explained what subexpressions are and how they can be used to reference pieces of an expression, you will use them to help solve the problem.

I've defined the following regular expression to match the text in the document.

(<font[^>]*>)(Hello )(world.)(</font>)

First, notice that there are four subexpressions defined in this regular expression. The first subexpression matches the beginning <font> tag ($1), the second matches "Hello " ($2), the third matches "world." ($3), and the fourth matches the closing </font> tag ($4). Although the subexpressions do not affect the criteria of the search, they do enable you to define logical groupings.

Complete the example now by performing the following steps:

  1. Create a new HTML document in Dreamweaver.
  2. Switch to Code view by selecting View > Code.
  3. Copy and paste the following three lines inside the <body> tag.
    <font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font>
    <font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font>
    <font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font>
  4. Set the cursor to be at the first line of the document.
  5. Open the Find and Replace dialog box by selecting Edit > Find and Replace.
  6. Select the Use regular expressions check box to enable regular expression searches.
  7. Select the Match case check box.
  8. Select Current Document in the Find In pop-up menu.
  9. Select Source Code in the Search pop-up menu.
  10. Type the following regular expression in the Find text box:

    (<font[^>]*>)(Hello )(world.)(</font>)

  11. Click Find Next and verify that all three instances of the <font> tag are selected, one at a time. This verifies that the regular expression is in fact matching.
  12. Type the following regular expression in the Replace text box:

    <span>$2Rob!</span>

    In the Replace text box, you are indicating that you want to replace the entire font tag with a string that begins with the <span> tag, followed by the second subexpression of the regular expressions you're searching for, followed by my name and a closing </span> tag. In other words, you want to preserve the subexpression "Hello" and make changes before and after it (see Figure 12).

    The Find and Replace dialog box with regular expressions in the &quot;Find:&quot; and &quot;Replace:&quot; text boxes

    Figure 12: The Find and Replace dialog box with regular expressions in the Find and Replace text boxes

  13. Click the Replace All button.
  14. In Code view, you should see that all three <font> tags have been removed and replaced with a newly constructed string that contains one fragment of the original string ("Hello").

    Code view showing &quot;Hello Rob!&quot; where the &quot;&lt;font&gt; tags and Hello World used to be

    Figure 13: Code view showing "Hello Rob!" where the <font> tags and Hello World used to be

Searching for CSS Inline Styles

Most web developers know that if a CSS rule is used more than once in a website, best practices recommend replacing instances of that style by a single rule. This approach makes for cleaner code that is easier to manage.

Imagine a scenario where you just started your first day at a new job and you have been asked to clean up all instances of the style attribute within HTML tags throughout a site. For example, look at the following snippet of code:

<div id="footer" style="border-color:red"> 
All styles in the site are protected. 
</div> 

Now imagine that you were to search for the word "style" in the example above. The first result would return the style attribute in the <div> tag. If you were to press the Find Again button, however, a false positive would appear because the instance of "styles" within the <div> tag would be selected.

In order to select only the style in the context of an attribute that defines a CSS style, you must refine your search by defining a regular expression (see Figure 14):

style=\"[A-Za-z0-9:;.-\s(/_)]*"
The Find and Replace dialog box with style=\&quot;[A-Za-z0-9:;.-\s(/_)]*&quot; in the &quot;Find:&quot; text box

Figure 14: The Find and Replace dialog box with style=\"[A-Za-z0-9:;.-\s(/_)]*" in the Find text box

Tips and Tricks

Regular expressions have the adverse reputation of being exceedingly difficult to author and read.

Recommendations

Here are some recommendations that will help you make the most of regular expressions:

Where to Go from Here

Mastering regular expressions requires the patience of a scientist and the skill of an artist. If you are new to regular expressions, it's important to realize that it takes practice before you can fully unravel the powers of regular expressions. As you begin to develop a basic understanding of the syntax, you will begin to uncover new techniques and build a greater level of confidence. I encourage you to practice by coming up with some regular expressions on your own that solve useful problems. The additional resources listed below will also prove helpful.

Additional Resources

References

About the author

Rob Christensen is a senior product manager on Adobe AIR. Prior to his current role, he managed a new product development framework used by many teams at Adobe and Macromedia including Dreamweaver and Flash. He also spent six years as an engineer on Dreamweaver starting with version 2. Rob's interests include web technologies, astronomy, film, travel, and exploring a variety of topics on personal blog.