20 June 2010
You need to have read Part 1, unless you are already familiar with regular expressions.
Beginning
Regular expressions (or regexes for short) are indispensible in find and replace operations. It doesn't matter how deeply buried the text you're looking for is, a well-defined regex will locate it in no time. Unfortunately, well-defined regexes are not always easy to create. Part 1 of this tutorial series covered the basic building blocks. In this second part, you'll put some of those principles to practical use through the Dreamweaver Find And Replace dialog box. You'll also learn how to use regexes in ColdFusion, JavaScript, and PHP.
The eye often deceives. Instead of seeing a repeated word, you read what you think should be there. You've likely seen something similar to Figure 1 many times, but the first time you saw it, you probably read it as Paris in the spring, and missed the second the. Once you're aware of the repeated word you can't miss it, but what about repeated words buried in dozens of lines of text?
The sample files for this tutorial contain a long extract from David Copperfield by Charles Dickens, in which four words have been deliberately duplicated. Searching for them visually would take a long time, but you can use a regex to find and delete them instantly.
Before you can delete the repeated words, you need to find them.
\b(\w+)\b \1
This searches for a word boundary, followed by one or more word characters in a capturing group, another word boundary, a literal space, and a backreference to the capturing group. In other words, it looks for a standalone word, captures its value, and looks for it to be repeated immediately after a space.
Note: The final duplicate is found because Dreamweaver treats regexes as case-insensitive. When using this regex with a programming language, you need to turn on the option for a case-insensitive match.
To eliminate the duplicate, you need to replace both words with just the first one. Within the regex, you use \1 as a backreference to the first capturing group. In a replace operation, the same value is represented by $1.
The Search panel opens and reports the results. Each replacement text is underlined by a red wavy line (see Figure 3). All the duplicates are gone.
It's too early to celebrate. Regexes have a nasty habit of doing things you don't expect.
To make it easier to see what's happening, this file contains a single sentence with just one pair of repeated words (see Figure 4).
The Search panel reports that three replacements were made, not just one (see Figure 5).
The regex looked only for repeated characters, not complete words. As a result, The theme was shortened to Theme and his history to history.
\b(\w+)\b \1\b
Dreamweaver CS5 comes with 16 predefined CSS layouts, which you can access from the New Document dialog box (see Figure 6).
To help you understand what the style rules are for, the designer, Stephanie Sullivan, has sprinkled the style sheet liberally with comments—the one used in the next exercise has no fewer than 33. The comments are great for learning purposes, but you should delete them before the site goes live. It's time for another regex.
The comments begin with /* and end with */. As explained in Part 1, * is a quantifier that means match 0 or more times. So, to match a literal asterisk, you need to escape it with a backslash.
/\*.*\*/
The regex uses three asterisks. The two outer ones are escaped with backslashes, so they are treated as literal characters. The middle one acts as a quantifier for the dot metacharacter, which matches anything except a newline. So, this regex looks for /* and */ with anything (or nothing) in between.
Note: In many programming languages, the beginning and end of a regex is marked by characters known as delimiters. If your regex flavor uses forward slashes as a delimiter, you need to escape any literal forward slashes with a backslash. I cover this issue in more detail in Using regular expressions in ColdFusion, JavaScript, and PHP.
The Search panel should report that 32 replacements have been made.
header class.As you can see, a long, multiline comment has been left behind (see Figure 8). The dot metacharacter matches anything except a newline, so the regex failed when it encountered this comment.
Note: By default, Dreamweaver soft wraps code when it reaches the right edge of Code view, but doesn't insert a newline character. This is the only comment in the file that contains newline characters, as indicated by the line numbers on the left.
Most regex flavors have an option to make the dot metacharacter match a newline, but this is not supported in JavaScript, which Dreamweaver uses in the background for its Find and Replace operations. However, it's easy to create a character class that does the same.
As I explained in Part 1, there are three pairs of metacharacters that match the opposite of each other. Put any of those pairs together in a character class, and it matches anything. For example, \w matches any word character, and \W matches anything that isn't a word character. So, [\w\W] matches anything, including a newline.
/\*[\w\W]*\*/
This certainly copes with newlines, but it grabs everything from the beginning of the first CSS comment to the end of the last one. The * quantifier is greedy.
/\*[\w\W]*?\*/
The Search panel should report there have been 33 replacements. This time, the multiline comment has been deleted, too.
The predefined CSS layouts in Dreamweaver CS5 use an image placeholder with an inline style for the logo. The instructions tell you how to remove the inline styles through the CSS Styles panel. You can use a regex instead.
HTML tags begin with < followed by the tag name, and end with >. The simple way to find a whole tag with a regex is to use <[^>]*>, which matches a pair of angle brackets with anything in between, as long as it's not a closing angle bracket.
Depending on where your cursor is at the time, the next HTML tag will be highlighted (see Figure 9).
Note: This regex matches any HTML, XML, or ColdFusion tag. It works in the vast majority of cases, but fails if the value of an attribute contains a closing angle bracket (for example, as a greater than sign in a JavaScript event handler). Creating a regex that covers all possible combinations is complex and time-consuming. Sometimes, it is better to understand the limitations of a regex than to strive for 100% perfection.
style= like this: <[^>]*style=
This selects the <img> tag for the logo placeholder up to and including the style attribute and the equal sign (see Figure 10).
You want to delete the style attribute, so you need to create a capturing group to preserve everything up to that point by adding parentheses like this:
(<[^>]*)style=
Dreamweaver produces clean code, so you know that the style attribute is followed by an equal sign, and the value is enclosed in double quotes. But what if the code has come from elsewhere? There might be spaces around the equal sign, and the value might be in single quotes.
To match zero or more spaces you need \s*.
To match single or double quotes, you need the alternation metacharacter. You also need to use a capturing group so you have a backreference to the closing quote.
(<[^>]*)style\s*=\s*('|")[^\2]*?\2
This surrounds the equal sign on both sides with zero or more spaces. Then a second capturing group alternates between single and double quotes.
Next comes a negative character class that uses the second backreference—in other words, it looks for anything except whatever was matched in the second capturing group (either a single or double quote). Contrary to what you might expect, the quantifier applied to this negative character class needs to be lazy (*?). If you use a greedy quantifier, such as * or +, the regex will grab everything in the rest of the code up to the final double quote.
Finally, the last \2 looks for the value stored in the second backreference (again either a single or double quote).
You should now have everything up to and including the closing quote of the style attribute (see Figure 11).
style attribute up to and including the closing angle bracket. In the current example, all that's left are the closing slash and the angle bracket, but in another case, you might have other attributes. Still, it's easy to accomplish with a negative character class and the closing bracket. The final section also needs to be in a capturing group so you can use it in the replace operation.(<[^>]*)style\s*=\s*('|")[^\2]*?\2([^>]*>)
After all that, the values you need for the Replace text box are remarkably simple. All that's necessary are the values in the first and third backreferences.
<img> tag has been selected.Admittedly, this is a lot of work to get rid of a single inline style. But what if you need to get rid of inline styles in dozens of pages? The same regex does the job, and it never gets tired.
Part 1 of this tutorial showed how to match North American phone numbers with the following regex:
\(?\d{3}\)? \d{3}[- ]\d{4}
This makes optional both the parentheses around the area code and the hyphen between the last two sets of digits. A consistent format looks much smarter. Applying one is easy to do with capturing groups around each set of digits.
\(?(\d{3})\)? (\d{3})[- ](\d{4})
This is the regex you used in Part 1 with capturing group parentheses added around each group of digits.
The phone numbers will be formatted with parentheses around the area code and a hyphen between the last two sets of digits (see Figure 13).
This tutorial series has concentrated on building and testing regexes in Dreamweaver, but regular expressions are also widely used in programming languages. This section highlights a few things you need to be aware of when using a regex in ColdFusion, JavaScript, or PHP.
The most important differences in the regex flavor supported by ColdFusion are as follows:
\1 instead of $1.^ and $ match the beginning or end of a line, you must precede the regex with (?m). However, the text must use only a line feed character. You need to remove carriage returns from text created on Windows, and replace Mac carriage returns with line feeds. The parentheses of (?m) are not counted as a capturing group.For more information see Using Regular Expressions in Functions in the ColdFusion help files.
There are two ways of creating a regex in JavaScript: using literal syntax or the RegExp() constructor.
To use literal syntax, wrap the regex in forward slashes, and assign it to a variable like this:
var phone = /\(?(\d{3})\)? (\d{3})[- ](\d{4})/;
The forward slashes are delimiters indicating that everything between them should be treated as a regex. If the regex needs to match a literal forward slash, escape it with a backslash. For example, the regex for a CSS comment needs to be rewritten like this:
var css_comment = /\/\*.*\*\//;
The RegExp() constructor requires the regex without delimiters as a string in single or double quotes. This avoids the need to escape forward slashes, but introduces a different complication, which is arguably worse. All backslashes need to be escaped. For example, instead of using \d and \w, you need to use \\d and \\w. The CSS comment regex looks like this:
var css_comment = new RegExp("/\\*.*\\*/");
You can modify a regex by specifying one or more flags. JavaScript supports the three flags listed in Table 1.
Table 1. Flags supported by JavaScript to modify a regex's behavior
| Flag | Meaning |
|---|---|
| i | Perform case-insensitive matching. |
| g | Match all instances, rather than just the first one. |
| m | Make ^ and $ match the beginning and end of lines, as well as the beginning and end of the subject text. |
When using literal syntax, add the flag(s) after the closing delimiter like this:
var css_comment = /\/\*.*\*\//gm;
When using the RegExp() constructor, pass the flags in a string as the second argument like this:
var css_comment = new RegExp("/\\*.*\\*/",
"gm");
PHP supports both Perl-style and POSIX regular expressions. However, support for POSIX is deprecated, and will be removed in PHP 6. Use only Perl-style regexes with the PHP functions that begin with preg_. Scripts that use split() or functions that begin with ereg should be updated as soon as possible.
Regexes in PHP should be enclosed in delimiters, and then wrapped in quotes. The normal convention is to use forward slashes. However, if the regex contains literal forward slashes, you need to escape them with backslashes like this:
$css_comment = '/\/\*.*\*\//';
Because this can make a regex difficult to read, you can use any pair of nonalphanumeric characters as delimiters, as long as they're not also used inside the regex. This means you can rewrite the previous example like this:
$css_comment = '#/\*.*\*/#';
You can also use curly braces ({}) or angle brackets (<>) as delimiters, again as long as they're not used as literal characters in the regex.
PHP regexes can be modified by adding one or more flags between the closing delimiter and quote. The most important flags are listed in Table 2.
Table 2. Flags used in PHP to modify a regex's behavior
| Flag | Meaning |
|---|---|
| i | Perform case-insensitive matching. |
| s | Make the dot metacharacter match newlines. |
| m | Make ^ and $ match the beginning and end of lines, as well as the beginning and end of the subject text. |
| u | Treat the regex as UTF-8 (Unicode). |
Creating regular expressions is fascinating, but it can also be frustrating. Part of the complexity lies in the different meanings some symbols (such as ^ and $) have depending on the context in which they're used. You also need to test your regexes thoroughly to make sure they don't match more than you expected—or don't match at all. Then there's the problem of the different flavors; PHP and ColdFusion support more features than JavaScript, but there are differences between each of them.
Fortunately, you're not alone. There are online resources and books that help you with regular expressions. Although you might enjoy the challenge of building your own regular expressions, it's not always necessary to reinvent the wheel. For many tasks, tested and proven solutions already exist. Just copy the relevant regex into the Dreamweaver Find And Replace dialog box or your programming language, and away you go. The following list is by no means exhaustive, but points you to some of the best resources:

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License
Tutorials and samples |
| 04/23/2012 | Resolution/Compatibility/liquid layout |
|---|---|
| 04/20/2012 | using local/testing server with cs5 inserting images look fine in the split screen but do not show |
| 04/18/2012 | Ap Div help |
| 04/23/2012 | Updating |