Least Recently Used (LRU) caches are a simple way to improve performance for functions with many repeated calls. This post describes an implementation in C++.
Not too long ago I had a job interview where I had to do a live coding challenge.
It was a dynamic programming type question.
I chose a nested for loop
approach, which worked well enough.
After I found a working solution, the interviewer showed me his solution.
It was based on recursion and the structure of his code was as follows:
I was mystified by the call to @lru_cache()
, but he didn’t have time to explain it properly.
So I went through his code after the interview. I noticed that if I left it out, the code still worked but was much slower.
This therefore was a neat optimisation trick.
Reading the Python documents confirmed my suspicion.
The @lru_cache()
is a wrapper that saves function calls and their results in a dictionary.
If it receives the exact same arguments more than once, it bypasses the function and returns the stored result instead.
Hence the increase in speed.
Recently I have been working on solving a type of puzzle known as a nonogram. I suggest reading my recent post to find out more detail. I wanted to improve performance of my solvers, and this was an obvious technique to try. For puzzles with lots of guessing, there is a lot of backtracking if an incorrect guess is made. However, the guess often only affects a small region, so the solution for the other areas of the puzzle can be reused. An LRU cache is an effective way to do this.
The following huskie puzzle is an example of a puzzle which requires guessing. Using an LRU cache reduces the solving time from 11.3 seconds to 3.5 seconds. That is more than a 300% reduction in solving time.
In my Python solution, I only had to add a two lines of code and fix a few others to get this performance increase. This was more involved with my C++ solution. There is no LRU cache in the standard library, and as far as I know, no way to dynamically wrap functions. But there are of course many available implementations on the internet. I was easily able to adjust these for my purposes.1 This also gave me an opportunity to explore the code in depth and to understand it properly.
The rest of this post describes the final implementation in my C++ Nonogram solver. This provide an easy to use framework to use for other projects.
In my matcherNFA.h header, I added a new cache object and made a new private find_match_()
function:
The private find_match_()
function is the same as the old find_match()
that is described in detail in my previous post.
The new public find_match()
chooses between drawing from the cache or calling find_match_()
. It is as follows:
At this high level, it doesn’t matter that the cache is an LRU cache - any type of cache should work. We’ll see in a moment why the LRU cache is a good choice.
The line_to_string()
is a custom function that turns the two input vectors into a string e.g. {3, 1, 1, 3, 3}, {2, 1}
to '31133-002001'
.
This is just to have a simpler key for a hash table inside the cache. Otherwise one can pass the original pair<vector<int>, vector<int>>
,
but then you also need to define a custom hasher because there is no inbuilt C++ hasher for this data type.
The simplest data structure for the cache is a hash table. This has $\mathcal{O}(1)$ complexity for lookup and also for determining if a key doesn’t exist. However the cache will grow without bound, and so this is very space inefficient.
To make this more space efficient, we could keep a counter of how often each entry is used, and periodically delete the least frequently used entries. Or we can add a timestamp to each entry, and update it each time that entry is used. When we want to delete elements, we delete the least recently used entries. Hence we have a least recently used (LRU) cache. An even more direct way to do this is to put all the entries in a queue, and always move the most recently used entries to the front. The least recently used entries are therefore always at the back.
The problem with using a queue is it has $\mathcal{O}(n)$ complexity for finding a key. On average you search through $\tfrac{n}{2}$ elements in the array to find a key. This is much slower than the hash table.
We can get the best of both worlds by using both a hash table and a queue. See the above figure. The hash table stores indices of the entries in the queue. So we can use the hash table to search through the cache in $\mathcal{O}(1)$ time, and use the queue to find the least recently used entries in $\mathcal{O}(1)$ time. For this to work, the hash table and queue must always be in sync. (In C++ we can actually do better, and store pointers directly to the nodes in queue. Therefore we don’t need to update indices in the table if the queue is modified.)
The highest level class is LRUCache, which stores both the queue, implemented as a DoublyLinkedList, and the hash table, which is an unordered_map from the standard library.
It keeps both these data structures in sync. The functions get()
, put()
and exists()
are used to interact with the cache.
The variables hits
and misses
measure performance. This is summarised with info()
.
Lastly, I’ve written all classes as templates, so that the user can define the key type and value type. Above, I defined these as string
and Match
(a custom class) respectively.
The DoublyLinkedList is doubly linked because each node has a pointer to both the previous node and the next node.
This enables both insertion and deletion to be done in $\mathcal{O}(1)$ time.
Using the DoublyLinkedList object, we only have direct access to the front and back nodes.
There are two functions to change the front node - add_page_to_head()
and move_page_to_head()
and two functions to change the back node - remove_back()
and get_back()
.
Middle nodes are found with their pointers in the hash table.
For the full function definitions, please see the lru_cache.h file on my GitHub. Note that I’ve included the function definitions in the header file instead of a separate source file. This is to avoid errors with the linker which arose because of the template classes. You can find out more about this error at Code Project.
I hope you’ve enjoyed this short post on LRU caches. It is a simple trick to get a nice performance boost.
My main sources were Bhrigu Srivastava’s blog and GeeksForGeeks. ↩