Adobe
Products
Acrobat
Creative Cloud
Creative Suite
Digital Marketing Suite
Digital Publishing Suite
Elements
Photoshop
Touch Apps
Student and Teacher Editions
More products
Solutions
Creative tools for business
Digital marketing
Digital media
Education
Financial services
Government
Web Experience Management
More solutions
Learning Help Downloads Company
Buy
Home use for personal and home office
Education for students, educators, and staff
Business for small and medium businesses
Licensing programs for businesses, schools, and government
Special offers
Search
 
Info Sign in
Welcome,
My cart
My orders My Adobe
My Adobe
My orders
My information
My preferences
My products and services
Sign out
Why sign in? Sign in to manage your account and access trial downloads, product extensions, community areas, and more.
Adobe
Products Sections Buy   Search  
Solutions Company
Help Learning
Sign in Sign out My orders My Adobe
Preorder Estimated Availability Date. Your credit card will not be charged until the product is shipped. Estimated availability date is subject to change. Preorder Estimated Availability Date. Your credit card will not be charged until the product is ready to download. Estimated availability date is subject to change.
Qty:
Purchase requires verification of academic eligibility
Subtotal
Review and Checkout
Adobe Developer Connection / Dreamweaver Developer Center /

Using regular expressions – Part 2: Finding and replacing text

by David Powers

David Powers
  • http://foundationphp.com/

Content

  • Using capturing groups to find and replace text
  • Removing CSS comments
  • Removing inline styles
  • Formatting North American phone numbers
  • Using regular expressions in ColdFusion, JavaScript, and PHP

Created

20 June 2010

Page tools

Share on Facebook
Share on Twitter
Share on LinkedIn
Bookmark
Print
Dreamweaver HTML text website

Requirements

Prerequisite knowledge

You need to have read Part 1, unless you are already familiar with regular expressions.

User level

Beginning

Required products

  • Dreamweaver CS5 (Download trial)

Sample files

  • regex_pt2.zip (10 KB)

Regular expressions (or regexes for short) are indispensible in find and replace operations. It doesn't matter how deeply buried the text you're looking for is, a well-defined regex will locate it in no time. Unfortunately, well-defined regexes are not always easy to create. Part 1 of this tutorial series covered the basic building blocks. In this second part, you'll put some of those principles to practical use through the Dreamweaver Find And Replace dialog box. You'll also learn how to use regexes in ColdFusion, JavaScript, and PHP.

Using capturing groups to find and replace text

The eye often deceives. Instead of seeing a repeated word, you read what you think should be there. You've likely seen something similar to Figure 1 many times, but the first time you saw it, you probably read it as Paris in the spring, and missed the second the. Once you're aware of the repeated word you can't miss it, but what about repeated words buried in dozens of lines of text?

We often read what we think we see, rather than what is actually there.
Figure 1. We often read what we think we see, rather than what is actually there.

The sample files for this tutorial contain a long extract from David Copperfield by Charles Dickens, in which four words have been deliberately duplicated. Searching for them visually would take a long time, but you can use a regex to find and delete them instantly.

Finding repeated words

Before you can delete the repeated words, you need to find them.

  1. Open repeated_01.html from the sample files for this tutorial in Dreamweaver. It doesn't matter whether you're in Design view or Code view.
  2. Choose Edit > Find And Replace to open the Find And Replace dialog box (you can also press Ctrl+F on Windows or Cmd+F on Mac OS X).
  3. Select Use Regular Expression and deselect Match Case.
  4. Select Current Document for the Find In option and select Source Code for the Search option.
  5. Type the following regex in the Find text box:
\b(\w+)\b \1

This searches for a word boundary, followed by one or more word characters in a capturing group, another word boundary, a literal space, and a backreference to the capturing group. In other words, it looks for a standalone word, captures its value, and looks for it to be repeated immediately after a space.

  1. Make sure the cursor is located at the start of the document, and click Find Next. Dreamweaver should highlight the the at the beginning of the third paragraph (see Figure 2).
Dreamweaver immediately detects the first duplicate word.
Figure 2. Dreamweaver immediately detects the first duplicate word.
  1. Click Find Next again. It should detect and and at the end of the fourth paragraph.
  2. Click Find Next twice again to find the remaining duplicate words (a a at the end of the seventh paragraph, and The the in the eleventh paragraph).

Note: The final duplicate is found because Dreamweaver treats regexes as case-insensitive. When using this regex with a programming language, you need to turn on the option for a case-insensitive match.

Deleting duplicate words

To eliminate the duplicate, you need to replace both words with just the first one. Within the regex, you use \1 as a backreference to the first capturing group. In a replace operation, the same value is represented by $1.

  1. Keep the regex from the preceding exercise in the Find text box.
  2. Type $1 in the Replace text box
  3. Click Replace All.

The Search panel opens and reports the results. Each replacement text is underlined by a red wavy line (see Figure 3). All the duplicates are gone.

The Search panel confirms that the duplicate words have been eliminated.
Figure 3. The Search panel confirms that the duplicate words have been eliminated.

Making sure only complete words are deleted

It's too early to celebrate. Regexes have a nasty habit of doing things you don't expect.

  1. Open repeated_02.html from the sample files.

    To make it easier to see what's happening, this file contains a single sentence with just one pair of repeated words (see Figure 4).

The second file contains just one pair of repeated words.
Figure 4. The second file contains just one pair of repeated words.
  1. Run the same Find and Replace operation as in the preceding exercise.

    The Search panel reports that three replacements were made, not just one (see Figure 5).

Three replacements have been made.
Figure 5. Three replacements have been made.

The regex looked only for repeated characters, not complete words. As a result, The theme was shortened to Theme and his history to history.

  1. Choose File > Revert, and click Yes to confirm that you want to undo the changes.
  2. To ensure that the value matched by the backreference is a complete word, add a word boundary at the end of regex in the Find text box. It should look like this:
\b(\w+)\b \1\b
  1. Click Replace All again. This time only the second was is deleted. Both The theme and his history remain intact.

Removing CSS comments

Dreamweaver CS5 comes with 16 predefined CSS layouts, which you can access from the New Document dialog box (see Figure 6).

You can select one of 16 predefined CSS layouts in Dreamweaver CS5.
Figure 6. You can select one of 16 predefined CSS layouts in Dreamweaver CS5.

To help you understand what the style rules are for, the designer, Stephanie Sullivan, has sprinkled the style sheet liberally with comments—the one used in the next exercise has no fewer than 33. The comments are great for learning purposes, but you should delete them before the site goes live. It's time for another regex.

  1. Open css_comments.html in the sample files for this tutorial. Alternatively, select the 2 Column Liquid, Right Sidebar, Header and Footer HTML layout in the New Document dialog box in Dreamweaver CS5.
  2. Switch to Code view, and examine the CSS comments.

    The comments begin with /* and end with */. As explained in Part 1, * is a quantifier that means match 0 or more times. So, to match a literal asterisk, you need to escape it with a backslash.

  3. Type the following regex in the Find text box of the Find And Replace dialog box:
/\*.*\*/

The regex uses three asterisks. The two outer ones are escaped with backslashes, so they are treated as literal characters. The middle one acts as a quantifier for the dot metacharacter, which matches anything except a newline. So, this regex looks for /* and */ with anything (or nothing) in between.

Note: In many programming languages, the beginning and end of a regex is marked by characters known as delimiters. If your regex flavor uses forward slashes as a delimiter, you need to escape any literal forward slashes with a backslash. I cover this issue in more detail in Using regular expressions in ColdFusion, JavaScript, and PHP.

  1. Make sure your cursor is at the top of the page in Code view, and click Find Next. Dreamweaver will highlight the first CSS comment (see Figure 7).
The regex matches the first CSS comment.
Figure 7. The regex matches the first CSS comment.
  1. Click Find Next again. Dreamweaver will select the much longer CSS comment that begins on the following line.
  2. Since the regex seems to be working, make sure the Replace text box is empty, and click Replace All.

    The Search panel should report that 32 replacements have been made.

  3. Scroll down the page until just after the style rule for the header class.

    As you can see, a long, multiline comment has been left behind (see Figure 8). The dot metacharacter matches anything except a newline, so the regex failed when it encountered this comment.

The multiline comment is untouched because the dot metacharacter doesn't match newlines.
Figure 8. The multiline comment is untouched because the dot metacharacter doesn't match newlines.

Note: By default, Dreamweaver soft wraps code when it reaches the right edge of Code view, but doesn't insert a newline character. This is the only comment in the file that contains newline characters, as indicated by the line numbers on the left.

  1. You need all the comments back before you can refine the regex, so select File > Revert, and click Yes when prompted to confirm you want to undo the changes.

    Most regex flavors have an option to make the dot metacharacter match a newline, but this is not supported in JavaScript, which Dreamweaver uses in the background for its Find and Replace operations. However, it's easy to create a character class that does the same.

    As I explained in Part 1, there are three pairs of metacharacters that match the opposite of each other. Put any of those pairs together in a character class, and it matches anything. For example, \w matches any word character, and \W matches anything that isn't a word character. So, [\w\W] matches anything, including a newline.

  2. Replace the dot metacharacter with the character class like this:
/\*[\w\W]*\*/
  1. Position your cursor at the top of the page, and click Find Next.

    This certainly copes with newlines, but it grabs everything from the beginning of the first CSS comment to the end of the last one. The * quantifier is greedy.

  2. Change * into a lazy quantifier by adding a question mark after it like this:
/\*[\w\W]*?\*/
  1. Select File > Revert, and click Yes when prompted to confirm you want to undo the changes.
  2. Move your cursor back to the top of the page, and click Find Next a couple of times to verify that a single comment is selected each time.
  3. Click Replace All.

The Search panel should report there have been 33 replacements. This time, the multiline comment has been deleted, too.

Removing inline styles

The predefined CSS layouts in Dreamweaver CS5 use an image placeholder with an inline style for the logo. The instructions tell you how to remove the inline styles through the CSS Styles panel. You can use a regex instead.

  1. Continue working with css_comments.html from the sample files. Make sure you are in Code view.

    HTML tags begin with < followed by the tag name, and end with >. The simple way to find a whole tag with a regex is to use <[^>]*>, which matches a pair of angle brackets with anything in between, as long as it's not a closing angle bracket.

  2. Type <[^>]*> in the Find text box, and click Find Next.

    Depending on where your cursor is at the time, the next HTML tag will be highlighted (see Figure 9).

The regex matches any HTML tag.
Figure 9. The regex matches any HTML tag.

Note: This regex matches any HTML, XML, or ColdFusion tag. It works in the vast majority of cases, but fails if the value of an attribute contains a closing angle bracket (for example, as a greater than sign in a JavaScript event handler). Creating a regex that covers all possible combinations is complex and time-consuming. Sometimes, it is better to understand the limitations of a regex than to strive for 100% perfection.

  1. Refine the regex by replacing the closing angle bracket with style= like this:
<[^>]*style=
  1. Click Find Next.

    This selects the <img> tag for the logo placeholder up to and including the style attribute and the equal sign (see Figure 10).

The revised regex identifies the only tag with a style attribute.
Figure 10. The revised regex identifies the only tag with a style attribute.

You want to delete the style attribute, so you need to create a capturing group to preserve everything up to that point by adding parentheses like this:

(<[^>]*)style=

Dreamweaver produces clean code, so you know that the style attribute is followed by an equal sign, and the value is enclosed in double quotes. But what if the code has come from elsewhere? There might be spaces around the equal sign, and the value might be in single quotes.

To match zero or more spaces you need \s*.

To match single or double quotes, you need the alternation metacharacter. You also need to use a capturing group so you have a backreference to the closing quote.

  1. Refine the regex like this:
(<[^>]*)style\s*=\s*('|")[^\2]*?\2

This surrounds the equal sign on both sides with zero or more spaces. Then a second capturing group alternates between single and double quotes.

Next comes a negative character class that uses the second backreference—in other words, it looks for anything except whatever was matched in the second capturing group (either a single or double quote). Contrary to what you might expect, the quantifier applied to this negative character class needs to be lazy (*?). If you use a greedy quantifier, such as * or +, the regex will grab everything in the rest of the code up to the final double quote.

Finally, the last \2 looks for the value stored in the second backreference (again either a single or double quote).

  1. Test the regex again.

    You should now have everything up to and including the closing quote of the style attribute (see Figure 11).

The regex now captures the style attribute and its value.
Figure 11. The regex now captures the style attribute and its value.
  1. To complete the regex, you need to get everything that follows the style attribute up to and including the closing angle bracket. In the current example, all that's left are the closing slash and the angle bracket, but in another case, you might have other attributes. Still, it's easy to accomplish with a negative character class and the closing bracket. The final section also needs to be in a capturing group so you can use it in the replace operation.
  2. Refine the regex further like this:
(<[^>]*)style\s*=\s*('|")[^\2]*?\2([^>]*>)

After all that, the values you need for the Replace text box are remarkably simple. All that's necessary are the values in the first and third backreferences.

  1. Type $1$3 in the Replace text box.
  2. Click Find Next, and verify that the whole <img> tag has been selected.
  3. Click Replace. The inline style should have been removed, leaving the other attributes intact.

Admittedly, this is a lot of work to get rid of a single inline style. But what if you need to get rid of inline styles in dozens of pages? The same regex does the job, and it never gets tired.

Formatting North American phone numbers

Part 1 of this tutorial showed how to match North American phone numbers with the following regex:

\(?\d{3}\)? \d{3}[- ]\d{4}

This makes optional both the parentheses around the area code and the hyphen between the last two sets of digits. A consistent format looks much smarter. Applying one is easy to do with capturing groups around each set of digits.

  1. Open phone_numbers.html in the sample files.
  2. Open the Find And Replace dialog box, and make sure Use Regular Expression is selected. It doesn't matter whether you're in Code view or Design view.
  3. Type the following regex in the Find text box:
\(?(\d{3})\)? (\d{3})[- ](\d{4})

This is the regex you used in Part 1 with capturing group parentheses added around each group of digits.

  1. Type ($1) $2-$3 in the Replace text box (see Figure 12).
Use a regex to format phone numbers consistently.
Figure 12. Use a regex to format phone numbers consistently.
  1. Click Replace All.

    The phone numbers will be formatted with parentheses around the area code and a hyphen between the last two sets of digits (see Figure 13).

The phone numbers are reformatted.
Figure 13. The phone numbers are reformatted.

Using regular expressions in ColdFusion, JavaScript, and PHP

This tutorial series has concentrated on building and testing regexes in Dreamweaver, but regular expressions are also widely used in programming languages. This section highlights a few things you need to be aware of when using a regex in ColdFusion, JavaScript, or PHP.

ColdFusion

The most important differences in the regex flavor supported by ColdFusion are as follows:

  • The dot metacharacter always matches a newline.
  • In a replacement string, backreferences use a backslash instead of a dollar sign; for example, use \1 instead of $1.
  • To make ^ and $ match the beginning or end of a line, you must precede the regex with (?m). However, the text must use only a line feed character. You need to remove carriage returns from text created on Windows, and replace Mac carriage returns with line feeds. The parentheses of (?m) are not counted as a capturing group.

For more information see Using Regular Expressions in Functions in the ColdFusion help files.

JavaScript

There are two ways of creating a regex in JavaScript: using literal syntax or the RegExp() constructor.

Using literal syntax

To use literal syntax, wrap the regex in forward slashes, and assign it to a variable like this:

var phone = /\(?(\d{3})\)? (\d{3})[- ](\d{4})/;

The forward slashes are delimiters indicating that everything between them should be treated as a regex. If the regex needs to match a literal forward slash, escape it with a backslash. For example, the regex for a CSS comment needs to be rewritten like this:

var css_comment = /\/\*.*\*\//;
Using the RegExp() constructor

The RegExp() constructor requires the regex without delimiters as a string in single or double quotes. This avoids the need to escape forward slashes, but introduces a different complication, which is arguably worse. All backslashes need to be escaped. For example, instead of using \d and \w, you need to use \\d and \\w. The CSS comment regex looks like this:

var css_comment = new RegExp("/\\*.*\\*/");
Modifying a regex with flags

You can modify a regex by specifying one or more flags. JavaScript supports the three flags listed in Table 1.

Table 1. Flags supported by JavaScript to modify a regex's behavior

Flag Meaning
i Perform case-insensitive matching.
g Match all instances, rather than just the first one.
m Make ^ and $ match the beginning and end of lines, as well as the beginning and end of the subject text.

When using literal syntax, add the flag(s) after the closing delimiter like this:

var css_comment = /\/\*.*\*\//gm;

When using the RegExp() constructor, pass the flags in a string as the second argument like this:

var css_comment = new RegExp("/\\*.*\\*/", "gm");

PHP

PHP supports both Perl-style and POSIX regular expressions. However, support for POSIX is deprecated, and will be removed in PHP 6. Use only Perl-style regexes with the PHP functions that begin with preg_. Scripts that use split() or functions that begin with ereg should be updated as soon as possible.

Regexes in PHP should be enclosed in delimiters, and then wrapped in quotes. The normal convention is to use forward slashes. However, if the regex contains literal forward slashes, you need to escape them with backslashes like this:

$css_comment = '/\/\*.*\*\//';

Because this can make a regex difficult to read, you can use any pair of nonalphanumeric characters as delimiters, as long as they're not also used inside the regex. This means you can rewrite the previous example like this:

$css_comment = '#/\*.*\*/#';

You can also use curly braces ({}) or angle brackets (<>) as delimiters, again as long as they're not used as literal characters in the regex.

PHP regexes can be modified by adding one or more flags between the closing delimiter and quote. The most important flags are listed in Table 2.

Table 2. Flags used in PHP to modify a regex's behavior

Flag Meaning
i Perform case-insensitive matching.
s Make the dot metacharacter match newlines.
m Make ^ and $ match the beginning and end of lines, as well as the beginning and end of the subject text.
u Treat the regex as UTF-8 (Unicode).

Where to go from here

Creating regular expressions is fascinating, but it can also be frustrating. Part of the complexity lies in the different meanings some symbols (such as ^ and $) have depending on the context in which they're used. You also need to test your regexes thoroughly to make sure they don't match more than you expected—or don't match at all. Then there's the problem of the different flavors; PHP and ColdFusion support more features than JavaScript, but there are differences between each of them.

Fortunately, you're not alone. There are online resources and books that help you with regular expressions. Although you might enjoy the challenge of building your own regular expressions, it's not always necessary to reinvent the wheel. For many tasks, tested and proven solutions already exist. Just copy the relevant regex into the Dreamweaver Find And Replace dialog box or your programming language, and away you go. The following list is by no means exhaustive, but points you to some of the best resources:

  • Regular-Expressions.info
  • RegExLib.com
  • Jan Goyvaerts & Steven Levithan, Regular Expressions Cookbook
  • Jeffrey Friedl, Mastering Regular Expressions, Second Edition

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License

More Like This

  • Integrating Flash content with the HTML environment
  • Understanding cascading style sheets
  • Customizable starter designs for beginners
  • CSS page layout basics
  • Managing websites with multiple layouts
  • Checking for cross-browser CSS rendering issues
  • Understanding inheritance
  • Creating dynamic tables with the Spry framework
  • Simple styling with CSS
  • Small web team uses CSS to develop big-time magazine site

Tutorials and samples

Tutorials

  • Understanding HTML5 semantics: Changed and absent elements
  • Mobile app with PhoneGap: Submitting to the Apple App Store
  • PhoneGap and Dreamweaver: Releasing on iOS
  • Mobile app with PhoneGap: Creating a release build for Android

Samples

  • Responsive design with jQuery marquee
  • Customizable starter design for jQuery Mobile
  • Customizable starter design for HTML5 video
  • Customizable starter design for multiscreen development

Dreamweaver user forum

More
04/23/2012 Resolution/Compatibility/liquid layout
04/20/2012 using local/testing server with cs5 inserting images look fine in the split screen but do not show
04/18/2012 Ap Div help
04/23/2012 Updating

Dreamweaver Cookbook

More
11/07/2011 Simple social networking share buttons
09/20/2011 Registration form that will generate email for registrant to validate
08/21/2011 Spry Accordion - Vertical Text - Auto Start on Page Load - Mouse Over Pause
08/17/2011 Using cfdump anywhere you like

Products

  • Acrobat
  • Creative Cloud
  • Creative Suite
  • Digital Marketing Suite
  • Digital Publishing Suite
  • Elements
  • Mobile Apps
  • Photoshop
  • Touch Apps
  • Student and Teacher Editions

Solutions

  • Digital marketing
  • Digital media
  • Web Experience Management

Industries

  • Education
  • Financial services
  • Government

Help

  • Product help centers
  • Orders and returns
  • Downloading and installing
  • My Adobe

Learning

  • Adobe Developer Connection
  • Adobe TV
  • Training and certification
  • Forums
  • Design Center

Ways to buy

  • For personal and home office
  • For students, educators, and staff
  • For small and medium businesses
  • For businesses, schools, and government
  • Special offers

Downloads

  • Adobe Reader
  • Adobe Flash Player
  • Adobe AIR
  • Adobe Shockwave Player

Company

  • News room
  • Partner programs
  • Corporate social responsibility
  • Career opportunities
  • Investor Relations
  • Events
  • Legal
  • Security
  • Contact Adobe
Choose your region United States (Change)
Choose your region Close

North America

Europe, Middle East and Africa

Asia Pacific

  • Canada - English
  • Canada - Français
  • Latinoamérica
  • México
  • United States

South America

  • Brasil
  • Africa - English
  • Österreich - Deutsch
  • Belgium - English
  • Belgique - Français
  • België - Nederlands
  • България
  • Hrvatska
  • Česká republika
  • Danmark
  • Eastern Europe - English
  • Eesti
  • Suomi
  • France
  • Deutschland
  • Magyarország
  • Ireland
  • Israel - English
  • ישראל - עברית
  • Italia
  • Latvija
  • Lietuva
  • Luxembourg - Deutsch
  • Luxembourg - English
  • Luxembourg - Français
  • الشرق الأوسط وشمال أفريقيا - اللغة العربية
  • Middle East and North Africa - English
  • Moyen-Orient et Afrique du Nord - Français
  • Nederland
  • Norge
  • Polska
  • Portugal
  • România
  • Россия
  • Srbija
  • Slovensko
  • Slovenija
  • España
  • Sverige
  • Schweiz - Deutsch
  • Suisse - Français
  • Svizzera - Italiano
  • Türkiye
  • Україна
  • United Kingdom
  • Australia
  • 中国
  • 中國香港特別行政區
  • Hong Kong S.A.R. of China
  • India - English
  • 日本
  • 한국
  • New Zealand
  • 台灣

Southeast Asia

  • Includes Indonesia, Malaysia, Philippines, Singapore, Thailand, and Vietnam - English

Copyright © 2012 Adobe Systems Incorporated. All rights reserved.

Terms of Use | Privacy Policy and Cookies (Updated)

Ad Choices

Reviewed by TRUSTe: site privacy statement