Regular Expressions
Regular Expressions are used to match patterns in strings. These patterns may be particular character sequences, character classes (e.g. white space, digits, upper case etc) or combinations thereof. Patterns may be combined to form more complex patterns using repetition and alternation operators.
For a more detailed treatment, please refer to this site, which is worth visiting just to see the photograph of the author. If that’s too long winded, try the cheat sheet instead.
Regex: A Simple Example
Whilst the concept is simple, RegEx syntax is not. RegEx’s can be hard to read. This expression matches a UK postcode:
^[A-Za-z]{1,2}[\d]{1,2}([A-Za-z])?\s?[\d][A-Za-z]{2}$ |
This example includes several key bits of RegEx syntax. The ^ and $ characters are anchors, and specify the beginning and end. Square brackets denote a character set to match, [A-Za-z] matches upper and lower case characters in the range A-Z. Curly braces denote the number of matches, so [A]{1,2} will match the character A once or twice. The ? matches the preceeding expression zero or once, so \s? matches a white space character zero or once.
In the context of .NET, the Regex class provides static methods to test a string for a match. Here is a simple but useless example:
if (RegEx.IsMatch("pattern", @"^[aenprt]{1,7}$")) { // do something... } |
Options
RegEx methods and constructors, where appropriate, provide an override where RegexOptions may be specified (e.g. compiled, case sensitive etc.). Options may be combined using a bitwise OR. You can also specify options inline within the regex pattern, by enclosing characters which map to regex options within brackets. There are two forms:
- Grouping construct (?imnsx-imnsx:)
- Miscelleneous construct (?imnsx-imnsx)
Prefixing a set of options with the minus sign turns them off. All the options are turned off by default. Compiled and RightToLeft may only be applied to an expression as a whole.
Grouping & Backreferences
Round brackets specify a group, and when referring back to a group it is possible to apply alternation and repitition operators. A simple example might be to search for repeating sequences of numbers preceeded by a space:
(?<groupname>\s\d)\k<groupname> |
The (?
Extracting matches
Using the static Regex.Match method you may extract matches from an input string. The sub pattern you wish to extract should be enclosed in parentheses.
string input = "Exception: error message"; Match m = Regex.Match(input, "Exception: (.*$)"); string match1 = m.Groups[1].Value; CaptureCollection captures = m.Groups[1].Captures; Capture c = captures[0]; Console.WriteLine(string.Format("group '{0}', capture '{1}', index {2}", match1, c.Value, c.Index)); // prints: group 'error message', capture 'error message', index 11 |
If there is a match, the Match.Success property = true. The Match.Groups array contains the matches, indexed starting at 1. If you name the groups (using angle brackets as described above), you can also index by name. Groups comprise of capture collections, and each capture contains the match along with index and length properties.
Replacing matches
Using the static Regex.Replace method you may replace matches within the input string. This simple example shows how to use named backreferences within the replacement pattern.
string output = Regex.Replace("alex peck", @"(?<forename>\S+) (?<surname>\S+)", "${surname}, ${forename}"); // output = "peck, alex" |
The replacement expression ${surname} inserts the match captured by the group (?
Online resources
- regexlib – Regular Expression Library
- regexpal – Test regular expressions in a web page (based on java)
- Regulator and Regulazy are available here
- reanimator – animate your regex as an automaton
- .NET MSDN reference