Timsort
Updated:
Timsort is the sorting algorithm that python
uses for sorted
.
It’s a really cool sort algorithm that’s rooted in real-world practicality and throws a wrench in your academic
understanding of sorting algorithms.
In school, I was always a little suspicious when studying sorting methods in school, they seemed to perfect and ideal. Yet no one really discusses the reality, because it’s boring and only interesting to those implementing it. Sure, merge sort has high overhead in call stack but whatever, the basic idea holds. Or quicksort is strangely complex yet preferred over the more intuitive and elegant heapsort, due to cache locality.
Unless you’re implementing these, cache locality means nothing. I say that as an application developer. I really don’t care which, I just need to sort a list of 1000 elements in under 200 ms.
But understanding timsort
has made me better appreciate and solidify my understanding and hunches.
What is the general premise
Timsort
is a hybrid sort, using elements from both mergesort
and insert sort
.
It performs better in the real-world because data is not randomly distributed, there are usually some semblance of ordering.
Mergesort
doesn’t take advantage of existing ordering.
If you perform mergesort
on a sorted list, it will take just as long as a randomly distributed list
(ignoring comparison costs).
Insertion sort
does take advantage of ordering.
If given a sorted list, it will run in O(n)
.
So… bolt on the two together and get the best of both worlds!
timsort
is a practical, real-world sorting algorithm.
But it’s built from base algorithms, which are easier to teach in school.
How does it work
Natural Merge sort
Take a look at mergesort
call tree:
Every function call is expensive.
It comes with stack allocation and comparisons.
The base case for mergesort
is a single-element array, which is sorted by definition.
We can short circuit that by taking advantage of natural runs.
Conceptually, merging runs is like short circuiting mergesort
’s depth.
For example, for [1,3,5,2,4,6]
, we have two runs.
There’s no need to break them down to single elements if we can merge the two subarrays, [1,3,5], [2,4,6]
.
This is actually an optimization called natural merge sort
.
Let’s pretend there’s an invariant where the data always has runs >32 length (we’ll tackle this later).
We avoid log(32) function calls per subarray chunk and the associated comparison costs.
Like arms-length recursion
, this optimization on leaves can reduce lots of calls.
Insertion sort
How do we ensure all subarrays will sorted and at least 32 long? Use insertion sort!
At a small enough N, insertion sort is faster than O(nlogn) sorts. The constant costs are smaller for insertion sort. It is O(n) comparisons and O(n) to shift. Shifting arrays in memory is quite fast and can be optimized with bulk shifts.
We don’t want to blindly insertion sort on a random chunk of 32 elements though. We want to take advantage of natural runs that appear in data. Finding runs is O(n), as it’s only comparisons. If a run is at least 32, then we’re good!
If not, we can use insertion sort to pad out a shorter run. Since we’re working on small chunks, we’re comparing and shifting a small number of elements. Which turns out to be not that slow in practice.
And even more optimizations
I’m not going to go any deeper. It’s a lot to take in and quickly diminishes in return. But it’s not too technical, just a lot of “edge case” handling. Anything to eke out performance and it all makes intuitive sense.
But I dare you to try to implement it, it’s the opposite of simple and elegant. It’s the textbook case study of optimizations.
Is it only for Python
It was implemented in 2002 for python and works really well. But nothing makes it language-specific. It’s now implemented for Java, Android, and V8.
Perhaps looking more closely at Java will shed a clue. Wikipedia says:
It is also used to sort arrays of non-primitive type in Java SE 7
Java does not use timsort
for primitives.
This is a big clue.
timsort
performs better for non-primitives because equality check on object’s requires dereferencing values.
This does not take advantage of cache locality, so other sorting methods can perform better in those cases.
- hybrid sort between insert and merge sort
- At top level, is merge sort. Which is O(nlogn) in all cases
- At lowest level, is insert sort, which can be O(n) if you’re smart
- The overview is use merge sort but be smarter with small, sorted data sets. Oh hey, bottoms-up merge sort is sorted
- merge sort has high constants in time complexity (N comparisons)
- merge sort has space complexity of O(n)
- insert sort has low constants in time complexity for sorted array (single comparison, single swap per element)