Regular expressions are often used for performing document-wide find-and-replace actions. In some cases, skilled regular expression authors bravely use regular expressions to make changes across dozens or even hundreds of documents.
In this example, I will show you how to use regular expressions to replace a piece of content.
Imagine a scenario where you've been asked to update an older web page and replace a specific string of text that uses <font> and a messaging "Hello world." You've been asked to:
<font> tags with the more modern, compliance-friendly <span> tag.Imagine that the document you are presented looks something like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body> <font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font> <font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font> <font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font> </body> </html>
There are not one but three instances of the font tag and text in question.
In order to accomplish this task, I'll need to introduce a new concept: subexpressions.
Regular expressions enable you to define subexpressions in order to refer later to certain fragments of your pattern. Subexpressions are defined using parentheses. A basic example of a subexpression would be a regular expression that looks like this:
(a)(b)(c)
This regular expression defines three subexpressions. The first expression pattern (a) can be referred to as the variable $1, the second (b) as $2, and the third (c) as $3. References to subexpressions are created sequentially beginning with 1.
Now that I've explained what subexpressions are and how they can be used to reference pieces of an expression, you will use them to help solve the problem.
I've defined the following regular expression to match the text in the document.
(<font[^>]*>)(Hello )(world.)(</font>)
First, notice that there are four subexpressions defined in this regular expression. The first subexpression matches the beginning <font> tag ($1), the second matches "Hello " ($2), the third matches "world." ($3), and the fourth matches the closing </font> tag ($4). Although the subexpressions do not affect the criteria of the search, they do enable you to define logical groupings.
Complete the example now by performing the following steps:
<body> tag.
<font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font> <font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font> <font face="Verdana, Arial, Helvetica, sans-serif">Hello world.</font>
(<font[^>]*>)(Hello )(world.)(</font>)
<font> tag are selected, one at a time. This verifies that the regular expression is in fact matching.<span>$2Rob!</span>
In the Replace text box, you are indicating that you want to replace the entire font tag with a string that begins with the <span> tag, followed by the second subexpression of the regular expressions you're searching for, followed by my name and a closing </span> tag. In other words, you want to preserve the subexpression "Hello" and make changes before and after it (see Figure 12).
Figure 12: The Find and Replace dialog box with regular expressions in the Find and Replace text boxes
In Code view, you should see that all three <font> tags have been removed and replaced with a newly constructed string that contains one fragment of the original string ("Hello").
Figure 13: Code view showing "Hello Rob!" where the <font> tags and Hello World used to be