Accessibility

Dreamweaver Article

 

Introduction to Regular Expressions in Dreamweaver


Table of Contents

Exercise: Your First Regular Expression

Now you will create your very first regular expression. This expression will search for every instance of a tag in a new HTML document and demonstrate some of the concepts mentioned earlier such as character classes, special characters, and repetition.

Initial Setup

Before you proceed with creating your first regular expression, make sure that Dreamweaver is in the correct state do so.

  1. Open the New Document dialog box by selecting File > New (see Figure 2).

    The New Document dialog box

    (+) View larger

    Figure 2: The New Document dialog box

  2. In the New Document dialog box, select Basic Page for Category and then select HTML as the Basic page type.
  3. Click the Create button.
  4. Switch to Code view by selecting View > Code.

    In Code view, you should now see the following HTML code:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd">
    <html>
    <head>
    <title>Untitled Document</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    </head>
    
    <body>
    </body>
    </html>
  5. Set the cursor to be at the first line of the document.
  6. Open the Find and Replace dialog box by selecting Edit > Find and Replace.
  7. Select the Use regular expressions check box to enable regular expression searches.
  8. Select Current Document in the Find In pop-up menu.
  9. Select Source Code in the Search pop-up menu.

    Your Find and Replace dialog box should look like Figure 3.

    The Find and Replace dialog box with the Current Document, Source Code, and Use regular expression settings applied

    Figure 3: The Find and Replace dialog box with the Current Document, Source Code, and Use regular expression settings applied

Step 1: Creating a Basic Alpha Search

In my experience, regular expressions are often easiest to create using an iterative approach that begins with a very basic pattern that is progressively refined by testing. I tend to use this approach initially, because it is mentally easier to debug and test a simplified regular expression than a complicated one.

As a reminder, the task at hand is to create a regular expression that matches all tags in a new HTML document created in Dreamweaver.

Taking into consideration my previous advice, you'll begin by handling the most basic case—successfully matching a case of a single tag. In this case, you'll search for the <html> tag.

  1. In the Find and Replace dialog box, type <html> in the Find text box (see Figure 4).

     The Find and Replace dialog box with <html> typed in the Find: text box.

    Figure 4: The Find and Replace dialog box with <html> typed in the Find text box

  2. Click the Find Next button.
  3. Verify that in the Find and Replace dialog box only one instance of the pattern was found and that <html> is selected in Code view. If you click Find Next again, the <html> remains selected, confirming that only one instance was found (see Figure 5).

    The Find and Replace text box showing that only one instance of <html> was found in the current document

    Figure 5. The Find and Replace text box showing that only one instance of <html> was found in the current document

Step 2: Using Character Classes to Find All Tags

In Step 1, you created a very strict regular expression that finds only one instance of a tag. In Step 2, you'll define a character class that demonstrates the true power of regular expressions.

Rather than searching only for the <html> tag, you will modify your regular expression to match any instance of any HTML tag in the document. In order to do this, you'll define a flexible custom character class and use repetition to allow for one or more instances of the class to be found.

Before you define your custom character class, you'll consider first the rules around how HTML tags are named. Here are several rules for how HTML tags are defined:

  • Tag names such as<b> <img> can only include alphabetic characters and not numbers.
  • Tag names in HTML can have uppercase or lowercase alphabetic characters. For XHTML documents, only lowercase characters are permitted but you'll presume that you're working with an HTML 4.0 document.
  • Tag names cannot contain any special characters.
  • Tag names must have at least one alphabetic characters.

Given these rules, you will begin refining your original regular expression <html>. You'll substitute the "html" fragment first in favor of a more flexible, custom character class that adheres to the aforementioned rules:

<[A-Za-z]

This character class reads as "match a less-than sign followed by any one of the range of characters listed inside of the brackets."

Custom classes enable you to specify a range of acceptable characters using a hyphen. It's important to note also that custom classes are case sensitive. In this case, the Match case check box in the Dreamweaver Find and Replace dialog box is not selected, so this is not a case-sensitive search.

The acceptable range of acceptable characters reads as any character from A through Z (uppercase) to a through z (lowercase). I have intentionally stripped off the trailing greater-than sign in the expression knowing that no matches will be found. The reason for this is that you have not yet defined a repetition, so the expression would have only matched tags that have a single alphabetic character such as <b>.

Now, test your regular expression in Dreamweaver.

  1. Set the cursor to be at the first line of the document in Code view. Doing so will make sure that Dreamweaver begins searching at the beginning of the document.
  2. In the Find and Replace dialog box, type <[A-Za-z] in the Find text box.
  3. Select the Match case check box (see Figure 6).

    The Find and Replace text box with the <[A-Za-z] class in the "Find:" text box and the "Match case" option selected.

    Figure 6: The Find and Replace text box with the <[A-Za-z] class in the Find text box and the Match case check box selected.

  4. Click the Find Next button.
  5. Verify in Code view that <h in <html> is selected (see Figure 7).

    Code view showing &quot;&lt;h&quot; in the &lt;html&gt; tag selected

    Figure 7: Code view showing <h in the <html> tag selected

  6. Click the Find Next button again and you'll see <h in the first part of <head> is selected (see Figure 8).

    ode view showing &quot;&lt;h&quot; in the &lt;head&gt; tag selected

    Figure 8: Code view showing <h in the <head> tag selected

  7. Click the Find Next button again. Notice that this time that <t is selected in <title> indicating that our pattern is matching a variety of different alphabetic characters after the less-than sign (see Figure 9).

    Code view showing &quot;&lt;t&quot; of the &lt;title&gt; tag selected

    Figure 9: Code view showing <t of the <title> tag selected

Step 3: Using Repetition

Although you have defined a character class consisting of all uppercase and lowercase alphabetic characters, your regular pattern is only matching up to the first alphabetic character following the less-than sign.

Since HTML tag names can contain one or more alphabetic characters, you'll introduce the + repetition character by inserting it after your custom character class. You will also add a close greater-than sign to match the end of the HTML tag.

<[A-Za-z]+> 

The regular expression now reads as "match a less-than sign followed by any one or more of the range of characters listed inside of brackets until a greater-than character is found."

  1. Set the cursor to be at the first line of the document in Code view.
  2. In the Find and Replace dialog box, type <[A-Za-z]+> in the Find text box.
  3. Select the Match case check box (see Figure 10).

    The Find and Replace dialog box with <[A-Za-z]+> in the Find: text box and the Match case option selected

    Figure 10: The Find and Replace dialog box with <[A-Za-z]+> in the Find text box and the Match case check box selected

  4. Click the Find Next button.
  5. Verify in Code view that <html> is selected. The regular expression is now successfully matching one or more alphabetic characters until it matches a greater-than sign and matching the entire tag.

    Code view showing &lt;html&gt; selected

    Figure 11: Code view showing <html> selected

  6. Click the Find Next button and notice that three more opening HTML tags are selected: <head>, <title>, and <body>. The only tag not selected is the <meta> tag and that's because it's the only tag that also includes a series of attributes and values. You'll need to make some changes in order to allow for attributes and values.