Skip lists

Jun 2019

In a nutshell

Linked lists are one of the most known data structures.

They allow for quick O(1) insert and delete, but lack of an efficient way to search for an element in a sorted collection. And O(n) search are definitely something you do not want to ever try.
Skip lists build upon linked lists to provide that missing feature.
Imagine a simple linked list with sorted elements:

Basic linked list

Performing a binary search would not be possible, as linked list do not allow random access.
If we had a simple array, the operations done when looking for 25 are trivial:

Each comparison needs to halve our input in two. This mechanism would be doable on a linked list if we had more information.
The simple yet amazing technique is that we can store these pieces of informations in other lists.
These will help us traverse our original list in a divide and conquer matter:

Each node has a height, which is to the number of forward pointers the node has. The number of nodes at any height decreases exponentially with that height.

Skip list consists of several layers of lists. The list at each layer is a sublist of the list at the layer beneath it.
It is then elementary to find a key in logarithmic time by searching at higher layers, skipping over large number of nodes, and going downward until the desired key is found.

Specifics

To get proper performance, lg n additional linked lists are gonna be needed. Simply put, as many as needed branches in a normal binary search.

We still have to find a way to promote elements from the bottom list to the others when inserting.
The easiest solution to just flip a coin: as long at is hits heads, promote this element to the upper list.

When deleting, you then have to remove all of the nodes containing an element.

These properties give us with high probability O(lg n) search, insert and delete.

In real life

Limits

Skip lists look a lot like trees, but have a few downsides:

memory usage: many pointers for a small gain
cache friendliness: cache misses are frequent because of their design
randomized data structure: no worst case guarantee

However, memory usage and cache locality can be improved with well thought implementations, as explained by Ticki’s article.

Concurrency

Skip lists also have an incredible use in concurrent programming.
Compared to tree-based data structures, that need frequent rebalancing and complex locking mechanisms, skip locks just need to lock a few nodes to perform concurrent writes.

William Pugh, the creator of skip lists, has also written a paper on their concurrent maintenance.

Priority search

Priority search is a search algorithm built upon skip lists, for better efficiency on frequently searched data.

The nodes that are often looked for are given higher height, and therefore found faster on subsequent searches.

In another word, the mostly searched data were located in the top-level of the skip list data structure, thus, the searching for these data has time complexity as ~O(1). The rarely searched data were located in the lowest level and their searching time complexities approximate to O(lgN).

However this paper doesn’t deal with rebalacing the skip list after new search frequencies.
It would be interesting to check how other data structures, such as Count-Min Sketch, could be used to save these new frequencies and update the skip list levels periodically.

With other data structures

They can also be used to perform fast search on top of other data structures that contain ranges of data.
Discord wrote an insightful article about how they used skip lists on top of ordered sets to perform fast insert and range lookup for their 11 million concurrent users.

Previous
« Docker group security