|Algorithms On Strings Now Open|
|Written by Sue Gee|
|Tuesday, 26 July 2016|
In a world with so much textual/symbolic data to be generated, read, stored and searched, string algorithms are increasingly important. A challenging new short course on the Coursera platform that opened today seems a good introduction.
If you only think of string algorithms as finding a substring or joining strings together then you will have to think again. There are problems on strings that are much more taxing that searching for a fixed string within another. For example, find the longest match between two strings irrespective of where the match starts, find the longest repeat within a string and so on. These are easy enough to do by exhaustive search but it doesn't take much for the string to get so big that the time such brute force algorithms take is far, far too long. For example, consider the task of working with the human genome coded as a string of characters taking 1.5GBytes.
Algorithms on Strings is part of the Data Structures and Algorithms Specialization that we covered when it this specialization launched in March, see New Coursera Core CS Specialization. The series consists of five classes, of which this is the fourth, and a Capstone Project.
The title of the Capstone, which starts in September is Assembling Genomes and Finding Disease-Causing Mutations, which rather explains why the first string algorithms you will meet in this 4-week module are motivated by problems from Bioinformatics (genome sequencing).
Week 1 of the course covers Suffix Trees and this is how the course description introduces the list of activities:
How would you search for a longest repeat in a string in LINEAR time? In 1973, Peter Weiner came up with a surprising solution that was based on suffix trees, the key data structure in pattern matching. Computer scientists were so impressed with his algorithm that they called it the Algorithm of the Year. In this lesson, we will explore some key ideas for pattern matching that will - through a series of trials and errors - bring us to suffix trees.
In Week 2 its goes on to Burrows-Wheeler Transform and Suffix Arrays:
Although EXACT pattern matching with suffix trees is fast, it is not clear how to use suffix trees for APPROXIMATE pattern matching. In 1994, Michael Burrows and David Wheeler invented an ingenious algorithm for text compression that is now known as Burrows-Wheeler Transform. They knew nothing about genomics, and they could not have imagined that 15 years later their algorithm will become the workhorse of biologists searching for genomic mutations. But what text compression has to do with pattern matching??? In this lesson you will learn that the fate of an algorithm is often hard to predict – its applications may appear in a field that has nothing to do with the original plan of its inventors.
In the final two weeks the course goes in to look at Algorithmic Challenges with a two-week programming assignment spanning both weeks, whereas the first two weeks each had a separate assignment to be completed in a wide choice of programming languages although reference solutions are only available in C++, Java and Python 3.
While Algorithms on Strings is the fourth in the series of courses you can, of course, take it as a standalone class. If you want to enroll in the full Specialization its worth noting that Algorithmic Toolbox, the first of the modules also started again this week.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Wednesday, 27 July 2016 )|