Master JavaScript Regular Expressions
Written by Ian Elliot   
Friday, 28 September 2012
Article Index
Master JavaScript Regular Expressions
Quantify
Groups
Lookahead capture

Lookahead Capture

There are two lookahead captures.

Zero-width positive lookahead assertion

(?=regex)

This continues the match only if the regex matches on the immediate right of the current position but doesn’t capture the regex or backtrack if it fails.

For example,

\w+(?=\d)

only matches a word ending in a digit but the digit is not included in the match.

That is it matches Paris9 but returns Paris as capture 0. In other words, you can use it to assert a pattern that must follow a matched subexpression.

Zero-width negative lookahead assertion

(?!regex)

This works like the positive lookahead assertion but the regex has to fail to match on the immediate right. For example:

\w+(?!\d)

only matches a word that doesn’t have a trailing digit.  That is, it matches Paris but not Paris9.

Replacements

So far we have created regular expressions with the idea that we can use them to test that a string meets a specification or to extract a substring.

These are the two conventional uses of regular expressions. However you can also use them to perform some very complicated string editing and rearrangements.

The whole key to this idea is that you can use the captures as part of the specified replacement string. The only slight problem is that the substitution strings use a slightly different syntax to a regular expression.

The replace method is a String function and it accepts a RegExp object to specify the match :

String.replace(RegExp,substitution)

simply takes every match of the associated regular expression and performs the substitution specified. Notice that it performs the substitution on the first match and the result returned is the entire string with the substitution made.

For example, if we define the regular expression:

var ex1= /(ISBN|ISBN-13)/;

and apply the following replacement:

var ans="ISBN: 978-1871962406".
              replace(ex1,"ISBN-13");

then the ISBN suffix will be replaced by ISBN-13. Notice that an ISBN-13 suffix will also be replaced by ISBN-13 so making all ISBN strings consistent.

This is easy enough to follow and works well as long as you have defined your regular expression precisely enough.

More sophisticated is the use of capture groups within the substitution string.

You can use:

$n

or

$nn

to refer to capture group n.  There are a range of other substitution strings but these are fairly obvious in use:

  • $$ insert $
  • $& the complete matched string
  • $` the portion of the string before the match
  • $' the portion of the string after the match

As an example of how this all works consider the problem of converting a US format date to a European format date i.e. to change mm/dd/yyyy to dd/mm/yyyy

First we need a regular expression to match the mm/dd/yyyy format:

var ex1=/(\d{1,2})\/(\d{1,2})\/(\d{4})/;

This isn’t a particularly sophisticated regular expression but we have allowed one or two digits for the month and day numbers but insisted on four for the year number. You can write a more interesting and flexible regular expression for use with real data. Notice that we have three capture groups corresponding to month, day and year.

To create a European style date all we have to do assemble the capture groups in the correct order in a substitution string:

ans="10/2/2012".replace(ex1,"$2/$1/$3");

This substitutes the day, month and year capture groups in place of the entire matched string, i.e. the original date.

Avoid overuse

Regular expressions are addictive in a way that can ultimately be unproductive.

It isn’t worth spending days crafting a single regular expression that matches all variations on a string when building one or two simpler alternatives and using a wider range of string operations would do the same job as well if not as neatly.

Resist the temptation to write regular expressions that you only just understand and always make sure you test them with strings that go well outside of the range of inputs that you consider correct – greedy matching and backtracking often result in the acceptance of a wider range of strings that was originally intended.

If you take care, however, regular expressions are a very powerful way of processing and transforming text without the need to move to a complete syntax analysis package.

Releated Articles

Finite State Machines

 

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin,  or sign up for our weekly newsletter.

 

blog comments powered by Disqus

 

Banner


Just JavaScript - The Object Expression

As in most programming languages the expression is an important part of JavaScript, but it isn't quite the same. This is where the idea that JavaScript has some weird type conversions arises. But Java [ ... ]



Javascript Jems - Asynchronous Patterns And Closure

If you have ever been kept awake at night worrying about closure then this is for you? You can understand closure, but what is it for? We explain how it can be all so easy and really, really usef [ ... ]


Other Articles



Last Updated ( Friday, 28 September 2012 )
 
 

   
RSS feed of all content
I Programmer - full contents
Copyright © 2014 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.