Python Key Functions

3 minute read

Updated:

Several python functions, such as sorted and max, accept a key parameter that acts as the comparator.

The comparator provides the ordering for a given element. For integers, this is based on value. For strings, this is lexical ordering (length and letter order).

History

Before python 2.4, these functions took a cmp param for the comparator function, much like Java and C#. But since then, it’s changed to a key function, that behaves slightly differently.

functools.cmp_to_key is available to convert old-style comparators to key functions.

Old-style cmp

An old-style cmp function was easy enough to write:

def custom_cmp(obj1, obj2):
  if obj1.property == obj2.property:
    return 0
  elif obj1.property == 'Something' and obj2.property == 'different':
    return 1
  else:
    return -1

sorted(list_of_custom, cmp=custom_cmp)

The interface was a two parameter function which returned an integral value. 0 meant equally ranking, positive meant the first param was larger, and negative meant it was less than.

The sort and max functions would compare two elements using this function, using the result to rank them.

Key Function

Key functions are easy to write too:

def custom_key(obj):
  return obj.property

sorted(list_of_custom, key=custom_key)

The interface was a single parameter function which returns a key object.

A common use case is sorting strings by length, the key is len. For every string in the list, the length is generated and that is used for ranking.

cmp vs. key function

It took me some time to wrap my head around the difference.

cmp-style gave a function for the sort to compare between two elements. The output of the cmp was directly stating the relative ordering of the two elements.

key-style gave a function for sort to generate a separate key element. The key would like scores, used for ranking.

How Does It Work Under The Hood

cmp_to_key

We can take a look at functools.cmp_to_key to give us a high-level intuition.

def cmp_to_key(mycmp):
    'Convert a cmp= function into a key= function'
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0
        def __ne__(self, other):
            return mycmp(self.obj, other.obj) != 0
    return K

It takes the cmp and wraps it with a custom class. The key function is simply the constructor.

When it comes time to rank and sort, these objects have all their operators re-implemented to use the cmp.

Using Key Function

Let’s look at how CPython actually uses the key function. Here’s is the implementation for 3.9.

if (keyfunc == NULL) {
    keys = NULL;
    lo.keys = saved_ob_item;
    lo.values = NULL;
}
else {
    ...
    ...
    for (i = 0; i < saved_ob_size ; i++) {
        keys[i] = PyObject_CallOneArg(keyfunc, saved_ob_item[i]);
      ...
      ...
    }

    lo.keys = keys;
    lo.values = saved_ob_item;
}

For every element in the list, cpython is running it through the key function. The elements are stored in lo.values and the keys in lo.keys.

If not key function is provided, than lo.keys is the list.

This pattern is referred to as a Schwartzian transform. The motivation is to transform every element to its ranking value only once. For expensive computation, this can be beneficial as sorting requires average of nlog(n) comparisons whereas generation happens n times.

Lists in python can contain non-homogenous types. Comparison between different types is possible, as they are all inherit from Object and can make use of __gt__, __lt__, __eq__, etc. But these methods require lookups in the dispatch tables, such is object oriented programming. key functions allow for transforming these objects to primitives, such as integer. The interpreter can do smarts with primitives and optimize it greatly.