.NET Regular Expressions In Depth
Written by Mike James   
Thursday, 16 July 2020
Article Index
.NET Regular Expressions In Depth
Quantifiers
Capture
Back references
Reduction

Banner

Replacements

So far we have created regular expressions with the idea that we can use them to test that a string meets a specification or to extract a substring.

These are the two conventional uses of regular expressions. However you can also use them to perform some very complicated string editing and rearrangements.

The whole key to this idea is that notion that you can use the captures as part of the specified replacement string. The only slight problem is that the substitution strings use a slightly different syntax to a regular expression.

The Replace method:

ex1.Replace(input,substitution)

simply takes every match of the associated regular expression and performs the substitution specified. Notice that it performs the substitution on every match and the result returned is the entire string with the substitutions made.

There are other versions of the Replace method but they all work in more or less the same way.

For example, if we define the regular expression:

Regex ex1 = new Regex(@"(ISBN|ISBN-13)");

and apply the following replacement:

MessageBox.Show(ex1.Replace(
          @"ISBN: 978-871962406","ISBN-13"));

then the ISBN suffix will be replaced by ISBN-13. Notice that an ISBN-13 suffix will also be replaced by ISBN-13 so making all ISBN strings consistent. Also notice that if there are multiple ISBNs within the string they will all be matched and replaced. There are versions of the method that allow you to restrict the number of matches that are replaced.

This is easy enough to follow and works well as long as you have defined your regular expression precisely enough. More sophisticated is the use of capture groups within the substitution string.

You can use:

@"$n"

to refer to capture group n or:

@"${name}"

to refer to a capture group by name. There are a range of other substitution strings but these are fairly obvious in use.

As an example of how this all works consider the problem of converting a US format date to a UK format date. First we need a regular expression to match the mm/dd/yyyy format:

Regex ex1 = new Regex(@"(?\d{1,2})/
              (?<day>\d{1,2})/(?<year>\d{4})");

This isn’t a particularly sophisticated regular expression but we have allowed one or two digits for the month and day numbers but insisted on four for the year number. You can write a more interesting and flexible regular expression for use with real data. Notice that we have three named capture groups corresponding to month, day and year.

To create a European style date all we have to do assemble the capture groups in the correct order in a substitution string:

MessageBox.Show(ex1.Replace(@" 10/2/2008",
                   "${day}/${month}/${year}$"));

This substitutes the day, month and year capture groups in place of the entire matched string, i.e. the original date.

regex

Avoid overuse

Regular expressions are addictive in a way that can ultimately be unproductive.

It isn’t worth spending days crafting a single regular expression that matches all variations on a string when building one or two simpler alternatives and using a wider range of string operations would do the same job as well if not as neatly.

Resist the temptation to write regular expressions that you only just understand and always make sure you test them with strings that go well outside of the range of inputs that you consider correct – greedy matching and backtracking often result in the acceptance of a wider range of strings that was originally intended.

If you take care, however, regular expressions are a very powerful way of processing and transforming text without the need to move to a complete syntax analysis package.

 

Related Articles

Regex Golf, XKCD And Peter Norvig       

Automatic Web Page Use With .NET       

Verbal Expressions Are Easier Than Regular Ones       

Regular Expression Crossword Site       

Master JavaScript Regular Expressions       

Online regular expression generator       

New tool detects RegEx security weakness       

 

 csharp

TIf you would like to suggest a topic for our Core C# section or if you have any comments contact our C# editor Mike James.

Deep C#

 Buy Now From Amazon

DeepCsharp360

 Chapter List

  1. Why C#?

    I Strong Typing & Type Safety
  2. Strong Typing
       Extract 
    Why Strong Typing
  3. Value & Reference
  4.    Extract Value And Reference
  5. Structs & Classes
       Extract
    Structs & Classes 
  6. Inheritance
      
    Extract
    Inheritance
  7. Interfaces & Multiple Inheritance
      
    Extract Interface
  8. Controlling Inheritance
    II Casting & Generics
  9. Casting - The Escape From Strong Typing
      
    Extract Casting I ***NEW!
  10. Generics
  11. Advanced Generics
  12. Anonymous & Dynamic
    Typing
    III Functions
  13. Delegates
  14. Multicast Delegates
  15. Anonymous Methods, Lambdas & Closures
    IV Async
  16. Threading, Tasks & Locking
  17. The Invoke Pattern
  18. Async Await
  19. The Parallel For
    V Data - LINQ, XML & Regular Expressions
  20. The LINQ Principle
  21. XML
  22. LINQ To XML
  23. Regular Expressions
    VI Unsafe & Interop
  24. Interop
  25. COM
  26. Custom Attributes
  27. Bit Manipulation
  28. Advanced Structs
  29. Pointers 

Extra Material

 <ASIN:1871962714>

 <ASIN:B09FTLPTP9>

Banner


Deep C# - Casting the Escape from Strong Typing

Casting is one of the most confusing aspects of any modern language and it often makes beginners think hard. But if you know why you are doing it, then the how makes a lot more sense. We have encounte [ ... ]



Deep C# - Interface

Interfaces - what are they for? Not quite inheritance yet they seem to fit the same purpose. Find out in this extract from my new book, Deep C#: Dive Into Modern C#.


Other Articles

kotlin book

 

Comments




or email your comment to: comments@i-programmer.info

 

 



Last Updated ( Thursday, 16 July 2020 )