Making .NET Regular Expressions Easier To Read

Yesterday Martin Fowler wrote a post about making regular expressions easier to read by using a variation of the Composed Method pattern.

His code ended up looking like this:

    const string scoreKeyword = @"^score\s+";
    const string numberOfPoints = @"(\d+)";
    const string forKeyword = @"\s+for\s+";
    const string numberOfNights = @"(\d+)";
    const string nightsAtKeyword = @"\s+nights?\s+at\s+";
    const string hotelName = @"(.*)";

    const string pattern =  scoreKeyword + numberOfPoints +
      forKeyword + numberOfNights + nightsAtKeyword + hotelName;

The goal is to name the various pieces of the regular expression so that it will be easier to decipher later.

I wrote a static class to do the same thing more concisely in .NET by using anonymous objects. The following snippet shows a regular expression before and after.

    // before (standard .NET regular expression instantiation)
    var expression = new Regex(@"<h(?<level>\d).*?>(?<title>.+?)</h\d>", RegexOptions.IgnoreCase);

    // using an anonymous object and the Composed Method pattern
    var expression = ComposeRegex.Compose(new
                {
                    openH = @"<h",
                    levelCapture = @"(?<level>\d)",
                    anyOtherAttributes = ".*?",
                    closeH = ">",
                    titleCapture = "(?<title>.+?)",
                    endHElement = @"</h\d>"
                }, RegexOptions.IgnoreCase);

And here is the code for the static class.

    public static class ComposeRegex
    {
        public static Regex Compose(object pattern, RegexOptions options)
        {
            string expression = string.Empty;
            foreach (var property in pattern.GetType().GetProperties(BindingFlags.Public | BindingFlags.Instance)) {
                if (property.CanRead) expression += property.GetValue(pattern, null);
            }
            return new Regex(expression, options);
        }
    }

Hopefully you think that the Composed Method version is more readable. I would love to hear suggestions to improve the implementation. Perhaps a fluent interface might work well?


Comments

Comments are closed