<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>out &#62;&#62; m_Conscientia; &#187; RegEx</title>
	<atom:link href="http://blog.hypercomplex.co.uk/index.php/tag/regex/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.hypercomplex.co.uk</link>
	<description>a multidimensional braindump</description>
	<lastBuildDate>Mon, 02 Aug 2010 22:30:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Parsing HTML tables into System.Data.DataTable</title>
		<link>http://blog.hypercomplex.co.uk/index.php/2010/05/parsing-html-tables-into-system-data-datatable/</link>
		<comments>http://blog.hypercomplex.co.uk/index.php/2010/05/parsing-html-tables-into-system-data-datatable/#comments</comments>
		<pubDate>Thu, 06 May 2010 20:50:08 +0000</pubDate>
		<dc:creator>Alex Peck</dc:creator>
				<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[DataSet]]></category>
		<category><![CDATA[DataTable]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[RegEx]]></category>

		<guid isPermaLink="false">http://blog.hypercomplex.co.uk/?p=877</guid>
		<description><![CDATA[What follows is a quick and dirty class I made to parse HTML tables into DataTables. As usual, it is the result of internet search/run/bug fix/refactor. In use, it looks little like this: WebClient client = new WebClient&#40;&#41;; string html = client.DownloadString&#40;@&#34;http://www.table.co.uk&#34;&#41;; DataSet dataSet = HtmlTableParser.Parse&#40;html&#41;; Here is the implementation. It&#8217;s not optimised for runtime [...]]]></description>
			<content:encoded><![CDATA[<p>What follows is a quick and dirty class I made to parse HTML tables into <a href="http://msdn.microsoft.com/en-us/library/system.data.datatable.aspx">DataTable</a>s. As usual, it is the result of internet search/run/bug fix/refactor.</p>
<p>In use, it looks little like this:</p>

<div class="wp_codebox"><table><tr id="p8776"><td class="code" id="p877code6"><pre class="csharp" style="font-family:monospace;">WebClient client <span style="color: #008000;">=</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> WebClient<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
<span style="color: #6666cc; font-weight: bold;">string</span> html <span style="color: #008000;">=</span> client<span style="color: #008000;">.</span><span style="color: #0000FF;">DownloadString</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">@&quot;http://www.table.co.uk&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
DataSet dataSet <span style="color: #008000;">=</span> HtmlTableParser<span style="color: #008000;">.</span><span style="color: #0000FF;">Parse</span><span style="color: #008000;">&#40;</span>html<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span></pre></td></tr></table></div>

<p>Here is the implementation. It&#8217;s not optimised for runtime performance, but it works.</p>

<div class="wp_codebox"><table><tr id="p8777"><td class="code" id="p877code7"><pre class="csharp" style="font-family:monospace;"><span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
<span style="color: #008080; font-style: italic;">/// HtmlTableParser parses the contents of an html string into a System.Data DataSet or DataTable.</span>
<span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
<span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">class</span> HtmlTableParser
<span style="color: #008000;">&#123;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">const</span> RegexOptions ExpressionOptions <span style="color: #008000;">=</span> RegexOptions<span style="color: #008000;">.</span><span style="color: #0000FF;">Singleline</span> <span style="color: #008000;">|</span> RegexOptions<span style="color: #008000;">.</span><span style="color: #0000FF;">Multiline</span> <span style="color: #008000;">|</span> RegexOptions<span style="color: #008000;">.</span><span style="color: #0000FF;">IgnoreCase</span><span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">const</span> <span style="color: #6666cc; font-weight: bold;">string</span> CommentPattern <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;&lt;!--(.*?)--&gt;&quot;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">const</span> <span style="color: #6666cc; font-weight: bold;">string</span> TablePattern <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;&lt;table[^&gt;]*&gt;(.*?)&lt;/table&gt;&quot;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">const</span> <span style="color: #6666cc; font-weight: bold;">string</span> HeaderPattern <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;&lt;th[^&gt;]*&gt;(.*?)&lt;/th&gt;&quot;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">const</span> <span style="color: #6666cc; font-weight: bold;">string</span> RowPattern <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;&lt;tr[^&gt;]*&gt;(.*?)&lt;/tr&gt;&quot;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">const</span> <span style="color: #6666cc; font-weight: bold;">string</span> CellPattern <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;&lt;td[^&gt;]*&gt;(.*?)&lt;/td&gt;&quot;</span><span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// Given an HTML string containing n table tables, parse them into a DataSet containing n DataTables.</span>
    <span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;param name=&quot;html&quot;&gt;An HTML string containing n HTML tables&lt;/param&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;returns&gt;A DataSet containing a DataTable for each HTML table in the input HTML&lt;/returns&gt;</span>
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #0600FF; font-weight: bold;">static</span> DataSet ParseDataSet<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">string</span> html<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        DataSet dataSet <span style="color: #008000;">=</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> DataSet<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        MatchCollection tableMatches <span style="color: #008000;">=</span> Regex<span style="color: #008000;">.</span><span style="color: #0000FF;">Matches</span><span style="color: #008000;">&#40;</span>
            WithoutComments<span style="color: #008000;">&#40;</span>html<span style="color: #008000;">&#41;</span>,
            TablePattern,
            ExpressionOptions<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #0600FF; font-weight: bold;">foreach</span> <span style="color: #008000;">&#40;</span>Match tableMatch <span style="color: #0600FF; font-weight: bold;">in</span> tableMatches<span style="color: #008000;">&#41;</span>
        <span style="color: #008000;">&#123;</span>
            dataSet<span style="color: #008000;">.</span><span style="color: #0000FF;">Tables</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Add</span><span style="color: #008000;">&#40;</span>ParseTable<span style="color: #008000;">&#40;</span>tableMatch<span style="color: #008000;">.</span><span style="color: #0000FF;">Value</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #008000;">&#125;</span>
&nbsp;
        <span style="color: #0600FF; font-weight: bold;">return</span> dataSet<span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// Given an HTML string containing a single table, parse that table to form a DataTable.</span>
    <span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;param name=&quot;tableHtml&quot;&gt;An HTML string containing a single HTML table&lt;/param&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;returns&gt;A DataTable which matches the input HTML table&lt;/returns&gt;</span>
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #0600FF; font-weight: bold;">static</span> DataTable ParseTable<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">string</span> tableHtml<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #6666cc; font-weight: bold;">string</span> tableHtmlWithoutComments <span style="color: #008000;">=</span> WithoutComments<span style="color: #008000;">&#40;</span>tableHtml<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        DataTable dataTable <span style="color: #008000;">=</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> DataTable<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        MatchCollection rowMatches <span style="color: #008000;">=</span> Regex<span style="color: #008000;">.</span><span style="color: #0000FF;">Matches</span><span style="color: #008000;">&#40;</span>
            tableHtmlWithoutComments,
            RowPattern,
            ExpressionOptions<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        dataTable<span style="color: #008000;">.</span><span style="color: #0000FF;">Columns</span><span style="color: #008000;">.</span><span style="color: #0000FF;">AddRange</span><span style="color: #008000;">&#40;</span>tableHtmlWithoutComments<span style="color: #008000;">.</span><span style="color: #0000FF;">Contains</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;&lt;th&quot;</span><span style="color: #008000;">&#41;</span>
                                       <span style="color: #008000;">?</span> ParseColumns<span style="color: #008000;">&#40;</span>tableHtml<span style="color: #008000;">&#41;</span>
                                       <span style="color: #008000;">:</span> GenerateColumns<span style="color: #008000;">&#40;</span>rowMatches<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        ParseRows<span style="color: #008000;">&#40;</span>rowMatches, dataTable<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #0600FF; font-weight: bold;">return</span> dataTable<span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// Strip comments from an HTML stirng</span>
    <span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;param name=&quot;html&quot;&gt;An HTML string potentially containing comments&lt;/param&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;returns&gt;The input HTML string with comments removed&lt;/returns&gt;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">static</span> <span style="color: #6666cc; font-weight: bold;">string</span> WithoutComments<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">string</span> html<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">return</span> Regex<span style="color: #008000;">.</span><span style="color: #0000FF;">Replace</span><span style="color: #008000;">&#40;</span>html, CommentPattern, <span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Empty</span>, ExpressionOptions<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// Add a row to the input DataTable for each row match in the input MatchCollection</span>
    <span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;param name=&quot;rowMatches&quot;&gt;A collection of all the rows to add to the DataTable&lt;/param&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;param name=&quot;dataTable&quot;&gt;The DataTable to which we add rows&lt;/param&gt;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">static</span> <span style="color: #6666cc; font-weight: bold;">void</span> ParseRows<span style="color: #008000;">&#40;</span>MatchCollection rowMatches, DataTable dataTable<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">foreach</span> <span style="color: #008000;">&#40;</span>Match rowMatch <span style="color: #0600FF; font-weight: bold;">in</span> rowMatches<span style="color: #008000;">&#41;</span>
        <span style="color: #008000;">&#123;</span>
            <span style="color: #008080; font-style: italic;">// if the row contains header tags don't use it - it is a header not a row</span>
            <span style="color: #0600FF; font-weight: bold;">if</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">!</span>rowMatch<span style="color: #008000;">.</span><span style="color: #0000FF;">Value</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Contains</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;&lt;th&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
            <span style="color: #008000;">&#123;</span>
                DataRow dataRow <span style="color: #008000;">=</span> dataTable<span style="color: #008000;">.</span><span style="color: #0000FF;">NewRow</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
                MatchCollection cellMatches <span style="color: #008000;">=</span> Regex<span style="color: #008000;">.</span><span style="color: #0000FF;">Matches</span><span style="color: #008000;">&#40;</span>
                    rowMatch<span style="color: #008000;">.</span><span style="color: #0000FF;">Value</span>,
                    CellPattern,
                    ExpressionOptions<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
                <span style="color: #0600FF; font-weight: bold;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">int</span> columnIndex <span style="color: #008000;">=</span> <span style="color: #FF0000;">0</span><span style="color: #008000;">;</span> columnIndex <span style="color: #008000;">&lt;</span> cellMatches<span style="color: #008000;">.</span><span style="color: #0000FF;">Count</span><span style="color: #008000;">;</span> columnIndex<span style="color: #008000;">++</span><span style="color: #008000;">&#41;</span>
                <span style="color: #008000;">&#123;</span>
                    dataRow<span style="color: #008000;">&#91;</span>columnIndex<span style="color: #008000;">&#93;</span> <span style="color: #008000;">=</span> cellMatches<span style="color: #008000;">&#91;</span>columnIndex<span style="color: #008000;">&#93;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Groups</span><span style="color: #008000;">&#91;</span><span style="color: #FF0000;">1</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">ToString</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
                <span style="color: #008000;">&#125;</span>
&nbsp;
                dataTable<span style="color: #008000;">.</span><span style="color: #0000FF;">Rows</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Add</span><span style="color: #008000;">&#40;</span>dataRow<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
            <span style="color: #008000;">&#125;</span>
        <span style="color: #008000;">&#125;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// Given a string containing an HTML table, parse the header cells to create a set of DataColumns</span>
    <span style="color: #008080; font-style: italic;">/// which define the columns in a DataTable.</span>
    <span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;param name=&quot;tableHtml&quot;&gt;An HTML string containing a single HTML table&lt;/param&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;returns&gt;A set of DataColumns based on the HTML table header cells&lt;/returns&gt;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">static</span> DataColumn<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> ParseColumns<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">string</span> tableHtml<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        MatchCollection headerMatches <span style="color: #008000;">=</span> Regex<span style="color: #008000;">.</span><span style="color: #0000FF;">Matches</span><span style="color: #008000;">&#40;</span>
            tableHtml,
            HeaderPattern,
            ExpressionOptions<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #0600FF; font-weight: bold;">return</span> <span style="color: #008000;">&#40;</span><span style="color: #0600FF; font-weight: bold;">from</span> Match headerMatch <span style="color: #0600FF; font-weight: bold;">in</span> headerMatches
                <span style="color: #0600FF; font-weight: bold;">select</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> DataColumn<span style="color: #008000;">&#40;</span>headerMatch<span style="color: #008000;">.</span><span style="color: #0000FF;">Groups</span><span style="color: #008000;">&#91;</span><span style="color: #FF0000;">1</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">ToString</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">ToArray</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// For tables which do not specify header cells we must generate DataColumns based on the number</span>
    <span style="color: #008080; font-style: italic;">/// of cells in a row (we assume all rows have the same number of cells).</span>
    <span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;param name=&quot;rowMatches&quot;&gt;A collection of all the rows in the HTML table we wish to generate columns for&lt;/param&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;returns&gt;A set of DataColumns based on the number of celss in the first row of the input HTML table&lt;/returns&gt;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">static</span> DataColumn<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> GenerateColumns<span style="color: #008000;">&#40;</span>MatchCollection rowMatches<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #6666cc; font-weight: bold;">int</span> columnCount <span style="color: #008000;">=</span> Regex<span style="color: #008000;">.</span><span style="color: #0000FF;">Matches</span><span style="color: #008000;">&#40;</span>
            rowMatches<span style="color: #008000;">&#91;</span><span style="color: #FF0000;">0</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">ToString</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,
            CellPattern,
            ExpressionOptions<span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Count</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #0600FF; font-weight: bold;">return</span> <span style="color: #008000;">&#40;</span><span style="color: #0600FF; font-weight: bold;">from</span> index <span style="color: #0600FF; font-weight: bold;">in</span> Enumerable<span style="color: #008000;">.</span><span style="color: #0000FF;">Range</span><span style="color: #008000;">&#40;</span><span style="color: #FF0000;">0</span>, columnCount<span style="color: #008000;">&#41;</span>
                <span style="color: #0600FF; font-weight: bold;">select</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> DataColumn<span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;Column &quot;</span> <span style="color: #008000;">+</span> Convert<span style="color: #008000;">.</span><span style="color: #0000FF;">ToString</span><span style="color: #008000;">&#40;</span>index<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">ToArray</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>As always, here are the tests. They yield 100% coverage but I still need to add some asserts on the column names.</p>

<div class="wp_codebox"><table><tr id="p8778"><td class="code" id="p877code8"><pre class="csharp" style="font-family:monospace;"><span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
<span style="color: #008080; font-style: italic;">/// Tests for the HtmlTableParser class</span>
<span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
<span style="color: #008000;">&#91;</span>TestClass<span style="color: #008000;">&#93;</span>
<span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">class</span> ParserTest
<span style="color: #008000;">&#123;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> TestContext testContextInstance<span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// Verify that HtmlTableParser can parse an HTML file containing a single table. The</span>
    <span style="color: #008080; font-style: italic;">/// test file includes a commented out table which should be ignored. Note some tags use</span>
    <span style="color: #008080; font-style: italic;">/// attributes (we test we can parse tags with and without attributes).</span>
    <span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
    <span style="color: #008000;">&#91;</span>TestMethod<span style="color: #008000;">&#93;</span>
    <span style="color: #008000;">&#91;</span>DeploymentItem<span style="color: #008000;">&#40;</span><span style="color: #666666;">@&quot;data\singleTable.txt&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#93;</span>
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">void</span> TestParseSingleTable<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #6666cc; font-weight: bold;">string</span> html <span style="color: #008000;">=</span> File<span style="color: #008000;">.</span><span style="color: #0000FF;">ReadAllText</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;singleTable.txt&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        DataTable table <span style="color: #008000;">=</span> HtmlTableParser<span style="color: #008000;">.</span><span style="color: #0000FF;">ParseTable</span><span style="color: #008000;">&#40;</span>html<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        AssertTable<span style="color: #008000;">&#40;</span>GetExpectedData<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, table<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// Verify that HtmlTableParser can parse an HTML file containing multiple tables. The</span>
    <span style="color: #008080; font-style: italic;">/// test file includes a commented out table which should be ignored. The test file</span>
    <span style="color: #008080; font-style: italic;">/// contains tables both with and without headers.</span>
    <span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
    <span style="color: #008000;">&#91;</span>TestMethod<span style="color: #008000;">&#93;</span>
    <span style="color: #008000;">&#91;</span>DeploymentItem<span style="color: #008000;">&#40;</span><span style="color: #666666;">@&quot;data\multipleTables.txt&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#93;</span>
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">void</span> TestParseMultipleTables<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #6666cc; font-weight: bold;">string</span> html <span style="color: #008000;">=</span> File<span style="color: #008000;">.</span><span style="color: #0000FF;">ReadAllText</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;multipleTables.txt&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        DataSet dataSet <span style="color: #008000;">=</span> HtmlTableParser<span style="color: #008000;">.</span><span style="color: #0000FF;">ParseDataSet</span><span style="color: #008000;">&#40;</span>html<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        Assert<span style="color: #008000;">.</span><span style="color: #0000FF;">AreEqual</span><span style="color: #008000;">&#40;</span><span style="color: #FF0000;">3</span>, dataSet<span style="color: #008000;">.</span><span style="color: #0000FF;">Tables</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Count</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        var expected <span style="color: #008000;">=</span> GetExpectedData<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #0600FF; font-weight: bold;">foreach</span> <span style="color: #008000;">&#40;</span>DataTable table <span style="color: #0600FF; font-weight: bold;">in</span> dataSet<span style="color: #008000;">.</span><span style="color: #0000FF;">Tables</span><span style="color: #008000;">&#41;</span>
        <span style="color: #008000;">&#123;</span>
            AssertTable<span style="color: #008000;">&#40;</span>expected, table<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #008000;">&#125;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">static</span> <span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> GetExpectedData<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">return</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span>
        <span style="color: #008000;">&#123;</span>
            <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #008000;">&#123;</span> <span style="color: #666666;">&quot;row 1, cell 1&quot;</span>, <span style="color: #666666;">&quot;row 1, cell 2&quot;</span> <span style="color: #008000;">&#125;</span>,
            <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #008000;">&#123;</span> <span style="color: #666666;">&quot;row 2, cell 1&quot;</span>, <span style="color: #666666;">&quot;row 2, cell 2&quot;</span> <span style="color: #008000;">&#125;</span>
        <span style="color: #008000;">&#125;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #0600FF; font-weight: bold;">static</span> <span style="color: #6666cc; font-weight: bold;">void</span> AssertTable<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> expected, DataTable table<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        Assert<span style="color: #008000;">.</span><span style="color: #0000FF;">AreEqual</span><span style="color: #008000;">&#40;</span>expected<span style="color: #008000;">.</span><span style="color: #0000FF;">Count</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, table<span style="color: #008000;">.</span><span style="color: #0000FF;">Rows</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Count</span>, <span style="color: #666666;">&quot;Table did not contain the expected number of rows&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #0600FF; font-weight: bold;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">int</span> i <span style="color: #008000;">=</span> <span style="color: #FF0000;">0</span><span style="color: #008000;">;</span> i <span style="color: #008000;">&lt;</span> expected<span style="color: #008000;">.</span><span style="color: #0000FF;">Count</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span> i<span style="color: #008000;">++</span><span style="color: #008000;">&#41;</span>
        <span style="color: #008000;">&#123;</span>
            <span style="color: #0600FF; font-weight: bold;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">int</span> j <span style="color: #008000;">=</span> <span style="color: #FF0000;">0</span><span style="color: #008000;">;</span> j <span style="color: #008000;">&lt;</span> expected<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Count</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span> j<span style="color: #008000;">++</span><span style="color: #008000;">&#41;</span>
            <span style="color: #008000;">&#123;</span>
                <span style="color: #6666cc; font-weight: bold;">string</span> actualElement <span style="color: #008000;">=</span> <span style="color: #008000;">&#40;</span>table<span style="color: #008000;">.</span><span style="color: #0000FF;">Rows</span><span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#91;</span>j<span style="color: #008000;">&#93;</span> <span style="color: #0600FF; font-weight: bold;">as</span> <span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Trim</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
                <span style="color: #6666cc; font-weight: bold;">string</span> expectedElement <span style="color: #008000;">=</span> expected<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#91;</span>j<span style="color: #008000;">&#93;</span><span style="color: #008000;">;</span>
&nbsp;
                Assert<span style="color: #008000;">.</span><span style="color: #0000FF;">AreEqual</span><span style="color: #008000;">&lt;</span><span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">&gt;</span><span style="color: #008000;">&#40;</span>expectedElement, actualElement, <span style="color: #666666;">&quot;Table did not contain the expected element&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
            <span style="color: #008000;">&#125;</span>
        <span style="color: #008000;">&#125;</span>
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>These are the test files, which are just some basic HMTL.</p>

<div class="wp_codebox"><table><tr id="p8779"><td class="code" id="p877code9"><pre class="html" style="font-family:monospace;">&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Strict//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&quot;&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;
&lt;head&gt;
      &lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=iso-8859-1&quot; /&gt;
    &lt;meta name=&quot;robots&quot; content=&quot;all&quot; /&gt;
      &lt;title&gt;Title&lt;/title&gt;
&lt;/head&gt;
&nbsp;
&lt;body&gt;
      &lt;!-- This table is commented out, so it shouldn't be parsed.
      &lt;table border=&quot;1&quot;&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;th&gt;Commented Heading 1&lt;/th&gt;
                  &lt;th&gt;Commented Heading 2&lt;/th&gt;
            &lt;/tr&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;Commented row 1, cell 1&lt;/td&gt;
                  &lt;td&gt;Commented row 1, cell 2&lt;/td&gt;
            &lt;/tr&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;Commented row 2, cell 1&lt;/td&gt;
                  &lt;td&gt;Commented row 2, cell 2&lt;/td&gt;
            &lt;/tr&gt;
      &lt;/table&gt;
      --&gt;
&nbsp;
      &lt;!-- The parser should ignore the border attributes --&gt;
      &lt;table border=&quot;1&quot;&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;th&gt;Heading 1&lt;/th&gt;
                  &lt;th&gt;Heading 2&lt;/th&gt;
            &lt;/tr&gt;
            &lt;tr&gt;
                  &lt;td border=&quot;1&quot;&gt;row 1, cell 1&lt;/td&gt;
                  &lt;td&gt;row 1, cell 2&lt;/td&gt;
            &lt;/tr&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;row 2, cell 1&lt;/td&gt;
                  &lt;td&gt;row 2, cell 2&lt;/td&gt;
            &lt;/tr&gt;
      &lt;/table&gt; 
&lt;/body&gt;
&lt;/html&gt;</pre></td></tr></table></div>


<div class="wp_codebox"><table><tr id="p87710"><td class="code" id="p877code10"><pre class="html" style="font-family:monospace;">&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Strict//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&quot;&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;
&lt;head&gt;
      &lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=iso-8859-1&quot; /&gt;
    &lt;meta name=&quot;robots&quot; content=&quot;all&quot; /&gt;
      &lt;title&gt;Title&lt;/title&gt;
&lt;/head&gt;
&nbsp;
&lt;body&gt;
      &lt;!-- The parser should ignore the border attributes --&gt;
      &lt;table border=&quot;1&quot;&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;row 1, cell 1&lt;/td&gt;
                  &lt;td&gt;row 1, cell 2&lt;/td&gt;
            &lt;/tr&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;row 2, cell 1&lt;/td&gt;
                  &lt;td&gt;row 2, cell 2&lt;/td&gt;
            &lt;/tr&gt;
      &lt;/table&gt; 
      &lt;!-- This table is commented out, so it shouldn't be parsed.
      &lt;table border=&quot;1&quot;&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;th&gt;Commented Heading 1&lt;/th&gt;
                  &lt;th&gt;Commented Heading 2&lt;/th&gt;
            &lt;/tr&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;Commented row 1, cell 1&lt;/td&gt;
                  &lt;td&gt;Commented row 1, cell 2&lt;/td&gt;
            &lt;/tr&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;Commented row 2, cell 1&lt;/td&gt;
                  &lt;td&gt;Commented row 2, cell 2&lt;/td&gt;
            &lt;/tr&gt;
      &lt;/table&gt;
      --&gt;
      &lt;table border=&quot;1&quot;&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;th&gt;Heading 1&lt;/th&gt;
                  &lt;th&gt;Heading 2&lt;/th&gt;
            &lt;/tr&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;row 1, cell 1&lt;/td&gt;
                  &lt;td&gt;row 1, cell 2&lt;/td&gt;
            &lt;/tr&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;row 2, cell 1&lt;/td&gt;
                  &lt;td&gt;row 2, cell 2&lt;/td&gt;
            &lt;/tr&gt;
      &lt;/table&gt; 
      &lt;table&gt;
            &lt;tr&gt;
                  &lt;td&gt;row 1, cell 1&lt;/td&gt;
                  &lt;td&gt;row 1, cell 2&lt;/td&gt;
            &lt;/tr&gt;
            &lt;tr  border=&quot;1&quot;&gt;
                  &lt;td&gt;row 2, cell 1&lt;/td&gt;
                  &lt;td border=&quot;1&quot;&gt;row 2, cell 2&lt;/td&gt;
            &lt;/tr&gt;
      &lt;/table&gt; 
&lt;/body&gt;
&lt;/html&gt;</pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://blog.hypercomplex.co.uk/index.php/2010/05/parsing-html-tables-into-system-data-datatable/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Regular Expressions</title>
		<link>http://blog.hypercomplex.co.uk/index.php/2009/08/regular-expressions/</link>
		<comments>http://blog.hypercomplex.co.uk/index.php/2009/08/regular-expressions/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 18:17:18 +0000</pubDate>
		<dc:creator>Alex Peck</dc:creator>
				<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[Application Development Foundation]]></category>
		<category><![CDATA[RegEx]]></category>

		<guid isPermaLink="false">http://blog.hypercomplex.co.uk/?p=292</guid>
		<description><![CDATA[Regular Expressions are used to match patterns in strings. These patterns may be particular character sequences, character classes (e.g. white space, digits, upper case etc) or combinations thereof. Patterns may be combined to form more complex patterns using repetition and alternation operators. For a more detailed treatment, please refer to this site, which is worth [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Regular_expression">Regular Expressions</a> are used to match patterns in strings. These patterns may be particular character sequences, character classes (e.g. white space, digits, upper case etc) or combinations thereof. Patterns may be combined to form more complex patterns using repetition and alternation operators. </p>
<p>For a more detailed treatment, please refer to <a href="http://www.regular-expressions.info/">this</a> site, which is worth visiting just to see the photograph of the author. If that&#8217;s too long winded, try the <a href="http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/">cheat sheet</a> instead.</p>
<p><strong><a href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx">Regex</a>: A Simple Example</strong></p>
<p>Whilst the concept is simple, RegEx syntax is not.  RegEx&#8217;s can be hard to read. This expression matches a UK postcode:</p>

<div class="wp_codebox"><table><tr id="p29216"><td class="code" id="p292code16"><pre class="xx" style="font-family:monospace;">^[A-Za-z]{1,2}[\d]{1,2}([A-Za-z])?\s?[\d][A-Za-z]{2}$</pre></td></tr></table></div>

<p>This example includes several key bits of RegEx syntax. The ^ and $ characters are anchors, and specify the beginning and end. Square brackets denote a character set to match, [A-Za-z] matches upper and lower case characters in the range A-Z. Curly braces denote the number of matches, so [A]{1,2} will match the character A once or twice. The ? matches the preceeding expression zero or once, so \s? matches a white space character zero or once.</p>
<p>In the context of .NET, the <a href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx">Regex</a> class provides static methods to test a string for a match. Here is a simple but useless example:</p>

<div class="wp_codebox"><table><tr id="p29217"><td class="code" id="p292code17"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF; font-weight: bold;">if</span> <span style="color: #008000;">&#40;</span>RegEx<span style="color: #008000;">.</span><span style="color: #0000FF;">IsMatch</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;pattern&quot;</span>, <span style="color: #666666;">@&quot;^[aenprt]{1,7}$&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
   <span style="color: #008080; font-style: italic;">// do something...</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p><strong>Options</strong></p>
<p>RegEx methods and constructors, where appropriate, provide an override where <a href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx">RegexOptions</a> may be specified (e.g. compiled, case sensitive etc.). Options may be combined using a bitwise OR. You can also <a href="http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx">specify options inline</a> within the regex pattern, by enclosing characters which map to regex options within brackets. There are two forms:</p>
<ul>
<li><a href="http://msdn.microsoft.com/en-us/library/bs2twtah.aspx">Grouping construct</a> (?imnsx-imnsx:)</li>
<li><a href="http://msdn.microsoft.com/en-us/library/x044wc7s.aspx">Miscelleneous construct</a> (?imnsx-imnsx)</li>
</ul>
<p>Prefixing a set of options with the minus sign turns them off. All the options are turned off by default. Compiled and RightToLeft may only be applied to an expression as a whole.</p>
<p><strong>Grouping &#038; Backreferences</strong></p>
<p>Round brackets specify a group, and when referring back to a group it is possible to apply alternation and repitition operators. A simple example might be to search for repeating sequences of numbers preceeded by a space:</p>

<div class="wp_codebox"><table><tr id="p29218"><td class="code" id="p292code18"><pre class="xx" style="font-family:monospace;">(?&lt;groupname&gt;\s\d)\k&lt;groupname&gt;</pre></td></tr></table></div>

<p>The (?<name>pattern) construct is used to denote the group, then \k<name> is used to refer back to the group. If you don&#8217;t specify a name within angle brackets, you can refer to groups by number using \l<em>n</em>, where <em>n</em> is the group number.</p>
<p><strong>Extracting matches</strong></p>
<p>Using the static <a href="http://msdn.microsoft.com/en-us/library/0z2heewz.aspx">Regex.Match</a> method you may extract matches from an input string. The  sub pattern you wish to extract should be enclosed in parentheses.</p>

<div class="wp_codebox"><table><tr id="p29219"><td class="code" id="p292code19"><pre class="csharp" style="font-family:monospace;"><span style="color: #6666cc; font-weight: bold;">string</span> input <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;Exception: error message&quot;</span><span style="color: #008000;">;</span>
Match m <span style="color: #008000;">=</span> Regex<span style="color: #008000;">.</span><span style="color: #0000FF;">Match</span><span style="color: #008000;">&#40;</span>input, <span style="color: #666666;">&quot;Exception: (.*$)&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
<span style="color: #6666cc; font-weight: bold;">string</span> match1 <span style="color: #008000;">=</span> m<span style="color: #008000;">.</span><span style="color: #0000FF;">Groups</span><span style="color: #008000;">&#91;</span><span style="color: #FF0000;">1</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Value</span><span style="color: #008000;">;</span>
&nbsp;
CaptureCollection captures <span style="color: #008000;">=</span> m<span style="color: #008000;">.</span><span style="color: #0000FF;">Groups</span><span style="color: #008000;">&#91;</span><span style="color: #FF0000;">1</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Captures</span><span style="color: #008000;">;</span>
Capture c <span style="color: #008000;">=</span> captures<span style="color: #008000;">&#91;</span><span style="color: #FF0000;">0</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">;</span>
Console<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteLine</span><span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Format</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;group '{0}', capture '{1}', index {2}&quot;</span>, match1, c<span style="color: #008000;">.</span><span style="color: #0000FF;">Value</span>, c<span style="color: #008000;">.</span><span style="color: #0000FF;">Index</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
<span style="color: #008080; font-style: italic;">// prints: group 'error message', capture 'error message', index 11</span></pre></td></tr></table></div>

<p>If there is a match, the Match.Success property = true. The Match.Groups array contains the matches, indexed starting at 1. If you name the groups (using angle brackets as described above), you can also index by name. Groups comprise of capture collections, and each capture contains the match along with index and length properties.</p>
<p><strong>Replacing matches</strong></p>
<p>Using the static <a href="http://msdn.microsoft.com/en-us/library/e7f5w83z.aspx">Regex.Replace</a> method you may replace matches within the input string. This simple example shows how to use named backreferences within the replacement pattern.</p>

<div class="wp_codebox"><table><tr id="p29220"><td class="code" id="p292code20"><pre class="csharp" style="font-family:monospace;"><span style="color: #6666cc; font-weight: bold;">string</span> output <span style="color: #008000;">=</span> Regex<span style="color: #008000;">.</span><span style="color: #0000FF;">Replace</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;alex peck&quot;</span>, <span style="color: #666666;">@&quot;(?&lt;forename&gt;\S+) (?&lt;surname&gt;\S+)&quot;</span>, <span style="color: #666666;">&quot;${surname}, ${forename}&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
<span style="color: #008080; font-style: italic;">// output = &quot;peck, alex&quot;</span></pre></td></tr></table></div>

<p>The replacement expression ${surname} inserts the match captured by the group (?<surname>). We could also use the expression $<em>n</em>, where <em>n</em> is an integer, to insert groups by number.</p>
<p><strong>Online resources</strong></p>
<ul>
<li><a href="http://regexlib.com/">regexlib</a> &#8211; Regular Expression Library</li>
<li><a href="http://regexpal.com/">regexpal</a> &#8211; Test regular expressions in a web page (based on java)</li>
<li>Regulator and Regulazy are available <a href="http://osherove.com/tools">here</a></li>
<li><a href="http://osteele.com/tools/reanimator/">reanimator</a> &#8211; animate your regex as an automaton</li>
<li>.NET <a href="http://msdn.microsoft.com/en-us/library/az24scfc%28VS.71%29.aspx">MSDN</a> reference</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.hypercomplex.co.uk/index.php/2009/08/regular-expressions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
