TimSort Really Is O(nlogn) |

Written by Mike James | |||

Wednesday, 05 September 2018 | |||

Most of us suppose that sorting algorithms are all done-and-dusted. Nothing new or exciting is left to find or discover, but TimSort, used in both Python and Java, was born in 2002 and is still being investigated. Obligatory xkcd cartoon.
If you have done almost any sort of computer science course, or just read a book, then you will know that the problem of sorting things into order is so important that we have been looking at it for ages. You probably also know that what matters is how fast the sort is in terms of the number of things to be sorted. The most popular "theoretical" sort is Quicksort because it is elegant, sophisticated and sorts End of story? No, because there are many more considerations than theoretical performance. For instance, you can ask what the worst case performance of a sorting method is. In other words, what is the slowest a sorting algorithm goes if you are evil and throw just the right, or more accurately wrong, data at it? In the case of Quicksort the answer is O( In practice, sorting algorithms have had to become clever at avoiding the sort of data that chokes their basic algorithm and switch to another algorithm if the data isn't right for them. This is where TimSort enters the picture. In 2002 Tim Peters designed a sorting algorithm for Python. It was convincingly good enough for Java to use it with some differences a bit later. The basis of the algorithm is that the input sequence is first scanned and decomposed into monotonic runs, i.e. increasing or decreasing sub-sequences. These sequences are then merged to give a sorted list. The originator of the basic idea of merging natural sequences was, like so much in computer science, due to Donald Knuth. This makes TimSort sound easy, but it also includes heuristics about what to do and when, and this makes it difficult to analyse. In short, this is not the elegant logic of a pure algorithm like Quicksort, but a practically optimized sorting procedure complete with magic numbers and choices. So much so that it was assumed that its average complexity was O(
What is slightly more worrying is that:
Yes, they found bugs in the Java implementation of TimSort. Sorting is still important and there is still much to be done. We need to move beyond the theoretical algorithms implemented on theoretical machines and look at what happens in real machines with complexities such as cache and out of order execution. Now is still a good time to be a computer scientist - if not the best. ## More InformationOn the Worst-Case Complexity of TimSort ## Related ArticlesTo be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
## Comments
or email your comment to: comments@i-programmer.info |
|||

Last Updated ( Saturday, 08 September 2018 ) |