Machine Learning Lab's Regular Expression Game
Written by Nikos Vaggalis   
Wednesday, 30 March 2016

Machine Learning Lab has created a game that puts your regular expression skills to the test.


We first met Machine Learning Lab in Automatically Generating Regular Expressions with Genetic Programming, when we used its RegexGenerator++. Now we are back to explore how to have fun with it.

The game comprises of 12 levels of increasing difficulty, with each level containing a different block of text, which in turn contains the parts of the text that must be matched against a regular expression - which has to supplied by the user. 

Each attempt is timed and rated in a  'F-measure' scale that
measures how close the extractions and the matches were, with 100 points being awarded for the perfect result, 0 for a totally wrong one or for giving up. 

Before the game begins you complete a simple questionnaire where you have to tick the boxes adjacent to the regular expression constructs you are familiar with.

The constructs are grouped by relevance, so for example, the Character classes group includes expressions such as: 

  • \w, matches any word character (alphanumeric & underscore). Only matches low-ascii characters (no accented or non-roman characters). Equivalent to [A-Za-z0-9_]

  • \d, matches any digit character (0-9). Equivalent to [0-9]

  • \s, matches any whitespace character (spaces, tabs, line breaks) 

Groups & Lookarounds has, among others:

  • (ABC), groups multiple tokens together and creates a capture group for extracting a substring or using a backreference.

  • \1, matches the results of a previous capture group. For example \1 matches the results of the first capture group and \3 matches the third.

  • (?:ABC), groups multiple tokens together without creating a capture group.

regexplay -questions

(click in image to enlarge)

This gives the sense that the test to follow will be adjusted to the level of experience determined through the questionnaire, but the exercises remain the same whether you tick all or none of the checkboxes. You can also navigate from task to task with no limitations as well as retake as many as you like.

The text to be matched comes pre-marked with a bold style so that it stands out from the rest of the text. What is very helpful is the immediate visual feedback you get as you type the characters forming the regular expression, because you get to observe what text they've  matched thus far, therefore getting progressively closer to the complete match by means of trial and error.



Level 1 starts out easy, asking you to match the digits in bold from the following block of text :

We have to extract these numbers 12, 47, 48
The integers numbers are also interesting: 189 2036 314
"," is a separator, so please extract these numbers 125,789,1450 and also these 564,90456
We like to offer you 7890$ per month in order to complete this task... we are joking
You are going to learn 3 things, the first one is not to extract, and 2 and 3 are simply digits.
Have fun with our mighty test, you are going to support science, progress, mankind wellness and you are going to waste 30 or 60 minutes of your life.
you can also extract exotic stuff like a456 gb67 and 45678911ghth

It's important to note that you don't write an expression that literally matches  189 2036 or 314 but one that matches against the pattern formulating from that text;that is "any sequence of digits" aka \d+

Level 2 is more challenging as you have to match MAC addresses such as 38:f8:b7:90:45:92

At Level 5 you are asked to match IP's such as while at Level 6 you encounter links, for example:
 <A href='/xpl/RecentCon.jsp?punumber=10417'>

regexplay - href

The difficulty increases until Level 12 where you have to match authors' names in the form of Lovecraft, H.P. or Duncan, R.

Take heed, however, the exercises are not as easy as they seem and require a good amount of time to master them all.  The solutions to the problems are not supplied so if you get stuck there is no one to help. Or is there?

Look no further than Machine Learning Lab's own Genetic Regex Generator++ of course! Start by copying the whole exercise block of text, then paste it into the Generator's form field and then from within that field highlight the text that needs to be extracted.After that just let the machine find the right regex for you, thus beating them in their own game!

In the end what this entertaining game offers is a testbed for your skills and an attempt to educate yourself in finding new expressions of matching text.

I have a suggestion for the game makers. It would be great if they could gamify it even more by optionally requiring users to log in with a username and a password, sharing their F-measure and time taken to complete the exercises, thus enabling the platform to  host competitions, league tables, top 10's of regex competence.

For a more advanced game, make sure you check Can You Do The Regular Expression Crossword?


More Information

The regular expression game

Genetic Regex Generator++

Related Articles

Automatically Generating Regular Expressions with Genetic Programming

Can You Do The Regular Expression Crossword?


To be informed about new articles on I Programmer, sign up for our weekly newsletter,subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin



Rust's Rapid Rise on TIOBE Index

Rust is making spectacular progress up the TIOBE index and JavaScript is also on the up and experiencing a personal best. Kotlin is maintaining its inclusion in the top 20 and the gap at the very top  [ ... ]

Apache NiFi Adds Python Processor Support

Apache NiFi 2, a project for processing and distributing data, has been released with support for Python processors in the MiNiFi framework, and a completely rebuilt user interface.

More News


kotlin book



or email your comment to:

Last Updated ( Wednesday, 30 March 2016 )