Do you format and comment your regular expressions?

Last updated by Igor Goldobin 3 months ago.See history

Regular expressions are a very powerful tool for pattern matching, but a complicated regex can be very difficult for a human to read and to comprehend. That is why, like any good code, a good regular expression must be well formatted and documented.

Here are some guidelines when formatting and documenting your regex:

  • Keep each line under 80 characters, horizontal scrolling reduces readability
  • Break long patterns into multiple lines, usually after a space or a line break
  • Indent bracers to help think in the right scope
  • Format complicated OR patterns into multiple blocks like a case statement
  • Comment your regex on what it does, don't just translate it into English
# Match <BODY
# Match any non > char for zero to infinite number of times

Bad example: Comment that translates the regex into English

# Match the BODY tag
# Match any character in the body tag
# Match the end BODY tag

Good example: Comment that explains the purpose of the pattern

(?six-mn:(Label|TextBox)\s+(?<Name>\w+).*(?<Result>\k<Name>\.TextAlign\s*=\s* ((System\.)?Drawing\.)?ContentAlignment\.(?! TopLeft|MiddleLeft|TopCenter|MiddleCenter)\w*)(?!(?<=\k<Name>\.Image.*)|(?

Bad example: Pray you never have to modify this regex

    # Match for Label or TextBox control
    # Store name into <name> group

    # Match any non-standard TextAlign
    # Store any match in Result group for error reporting in CA
        # Match for control's TextAlign Property

        # Match for possible namespace

        # Match any ContentAlignment that is not in the group

    # Skip any Control that has image on it

Good example: Now it make sense!

We open source. Powered by GitHub