Skip to content

Regular Expressions

by Alex Peck on August 3rd, 2009

Regular Expressions are used to match patterns in strings. These patterns may be particular character sequences, character classes (e.g. white space, digits, upper case etc) or combinations thereof. Patterns may be combined to form more complex patterns using repetition and alternation operators.

For a more detailed treatment, please refer to this site, which is worth visiting just to see the photograph of the author. If that’s too long winded, try the cheat sheet instead.

Regex: A Simple Example

Whilst the concept is simple, RegEx syntax is not. RegEx’s can be hard to read. This expression matches a UK postcode:

^[A-Za-z]{1,2}[\d]{1,2}([A-Za-z])?\s?[\d][A-Za-z]{2}$

This example includes several key bits of RegEx syntax. The ^ and $ characters are anchors, and specify the beginning and end. Square brackets denote a character set to match, [A-Za-z] matches upper and lower case characters in the range A-Z. Curly braces denote the number of matches, so [A]{1,2} will match the character A once or twice. The ? matches the preceeding expression zero or once, so \s? matches a white space character zero or once.

In the context of .NET, the Regex class provides static methods to test a string for a match. Here is a simple but useless example:

if (RegEx.IsMatch("pattern", @"^[aenprt]{1,7}$"))
{
   // do something...
}

Options

RegEx methods and constructors, where appropriate, provide an override where RegexOptions may be specified (e.g. compiled, case sensitive etc.). Options may be combined using a bitwise OR. You can also specify options inline within the regex pattern, by enclosing characters which map to regex options within brackets. There are two forms:

Prefixing a set of options with the minus sign turns them off. All the options are turned off by default. Compiled and RightToLeft may only be applied to an expression as a whole.

Grouping & Backreferences

Round brackets specify a group, and when referring back to a group it is possible to apply alternation and repitition operators. A simple example might be to search for repeating sequences of numbers preceeded by a space:

(?<groupname>\s\d)\k<groupname>

The (?pattern) construct is used to denote the group, then \k is used to refer back to the group. If you don’t specify a name within angle brackets, you can refer to groups by number using \ln, where n is the group number.

Extracting matches

Using the static Regex.Match method you may extract matches from an input string. The sub pattern you wish to extract should be enclosed in parentheses.

string input = "Exception: error message";
Match m = Regex.Match(input, "Exception: (.*$)");
string match1 = m.Groups[1].Value;
 
CaptureCollection captures = m.Groups[1].Captures;
Capture c = captures[0];
Console.WriteLine(string.Format("group '{0}', capture '{1}', index {2}", match1, c.Value, c.Index));
// prints: group 'error message', capture 'error message', index 11

If there is a match, the Match.Success property = true. The Match.Groups array contains the matches, indexed starting at 1. If you name the groups (using angle brackets as described above), you can also index by name. Groups comprise of capture collections, and each capture contains the match along with index and length properties.

Replacing matches

Using the static Regex.Replace method you may replace matches within the input string. This simple example shows how to use named backreferences within the replacement pattern.

string output = Regex.Replace("alex peck", @"(?<forename>\S+) (?<surname>\S+)", "${surname}, ${forename}");
// output = "peck, alex"

The replacement expression ${surname} inserts the match captured by the group (?). We could also use the expression $n, where n is an integer, to insert groups by number.

Online resources

  • regexlib – Regular Expression Library
  • regexpal – Test regular expressions in a web page (based on java)
  • Regulator and Regulazy are available here
  • reanimator – animate your regex as an automaton
  • .NET MSDN reference
No comments yet

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS