Regular Expressions Search & Replace

A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that define a search pattern. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory. (From Wikipedia, the free encyclopedia)

Basic Concepts

(From Wikipedia)

A regular expression, often called a pattern, specifies a set of strings required for a particular purpose. A simple way to specify a finite set of strings is to list its elements or members. However, there are often more concise ways: for example, the set containing the three strings "Handel", "Händel", and "Haendel" can be specified by the pattern H(ä|ae?)ndel; we say that this pattern matches each of the three strings. In most formalisms, if there exists at least one regular expression that matches a particular set then there exists an infinite number of other regular expressions that also match it—the specification is not unique. Most formalisms provide the following operations to construct regular expressions.

Boolean "or"
A vertical bar separates alternatives. For example, gray|grey can match "gray" or "grey".
Grouping
Parentheses are used to define the scope and precedence of the operators (among other uses). For example, gray|grey and gr(a|e)y are equivalent patterns which both describe the set of "gray" or "grey".
Quantification
A quantifier after a token (such as a character) or group specifies how often that a preceding element is allowed to occur. The most common quantifiers are the question mark ?, the asterisk * (derived from the Kleene star), and the plus sign + (Kleene plus).
?
The question mark indicates zero or one occurrences of the preceding element. For example, colou?r matches both "color" and "colour".
*
The asterisk indicates zero or more occurrences of the preceding element. For example, ab*c matches "ac", "abc", "abbc", "abbbc", and so on.
+
The plus sign indicates one or more occurrences of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
{n}
The preceding item is matched exactly n times.
{min, }
The preceding item is matched min or more times.
{min,max}
The preceding item is matched at least min times, but not more than max times.
Wildcard

The wildcard . matches any character. For example, a.b matches any string that contains an "a", then any other character and then a "b", a.*b matches any string that contains an "a" and a "b" at some later point.

These constructions can be combined to form arbitrarily complex expressions, much like one can construct arithmetical expressions from numbers and the operations +, −, ×, and ÷. For example, H(ae?|ä)ndel and H(a|ae|ä)ndel are both valid patterns which match the same strings as the earlier example, H(ä|ae?)ndel.

Using Regular Expressions in Dreamweaver

Invoke the search dialog with the control(Ctrl) and H keys together. In the filter options choose Regular Expressions Regular Expressions

Example one: replacing italics with bolding

On iService italics are reserved for text that refers to an act or law such as the Privacy Act.

Regular expressions can be used to replace content that is in italics <em> with bolding <strong>. Using this example, search for <em>(.*?)</em> and replace it with <strong>$1</strong>.

Copy and paste this into Dreamweaver in the code view:

<em>Lorem ipsum dolor sit amet</em>, consectetur adipisicing elit. Velit perspiciatis aperiam quas, eaque nulla modi expedita ab, iure ex, vel ipsum minus praesentium molestiae saepe. Itaque praesentium in, illum temporibus. Lorem ipsum dolor sit amet, consectetur adipisicing elit. At voluptatibus, quas autem officiis, laboriosam non! Explicabo, delectus iusto blanditiis nulla quidem similique, deleniti molestias vero ut <em>assumenda officia</em> fuga quos. Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quam nihil dignissimos, <em>natus veniam consequuntur</em> ad eius libero culpa nesciunt nulla, modi vitae debitis eveniet <em>tempora iste numquam</em> odio nam dolor.

Explanation of the code

<em>(.*?)</em>

  1. The opening <em> tag
  2. whatever text that follows the italics tag (.*?)
  3. The closing </em> tag
Parenthesis ()
A grouping that is saved as a variable that can be recalled
The Period "."
A wildcard that matches any character.
The Asterisk "*"
A quantifier that indicates zero or more occurrences of the preceding element, in this case the wildcard.
The question mark "?"
A quantifier that indicates zero or one occurrences of the preceding element, this our code the asterisk.

In the replace section of our search and replace we put in the opening bold tag <strong> followed by the first variable that was saved by the regular expression "$1", following that the closing strong tag </strong>. <strong>$1</strong>

A numbered list example.

Since the html will provide the numbers for this list we need to remove the number, the period and a space from our list of items.

Search for: <li>[\d]+\.[\s]+ The [\d]+ is one or more digits, the [\s]+ is one or more spaces.

Replace with: <li>

<ol>
<li>1. List item</li>
<li>2. List item</li>
<li>3. List item</li>
<li>4. List item</li>
<li>5. Squirrel</li>
<li>6. List item</li>
<li>7. List item</li>
<li>8. List item</li>
</ol>