Tommaso Amici - A reactive approach to backend response caching

6 minutes

Published 2 years ago

In this post I discuss how to implement backend response caching based on dependencies between data, significantly simplifying invalidation.

There are only two hard things in Computer Science: cache invalidation and naming things.

Phil Karlton

You've probably heard this quote before, and it's mostly true, but it doesn't have to be this way, at least not for cache invalidation. In this post I discuss how to elegantly implement response caching based on dependencies between data, significantly simplifying invalidation.

Backend response caching

First, let me clarify what I mean with backend response caching. In this case, I'm talking about caching the response of a view or API endpoint in Redis or some other cache, and returning the cached response if it exists. Nothing to do with HTTP caching.

In Python pseudo-code, it would look something like this:

def products_view():
    cached_entry = cache.get("products_view")
    if cached_entry:
        return cached_entry

    response = Products.all()
    cache.set("products_view", response)
    return response

The problem with this view is that whenever a product is added or deleted, the cached response will be outdated, and we need to either invalidate it or wait until its TTL expires.

An intuitive solution to this problem might be to invalidate the cache in the Product model when we're saving changes:

class Product:
    def save(self):
        db.commit()
        cache.delete("products_view")

If you only have one view and one model, this approach is perfectly fine, but it doesn't scale well. What if you have multiple views that depend on the Product model? You would have to invalidate the cache entries for all of them, and if you add a new view that depends on the Product model, you would also have to remember to update the invalidation in the Product model itself. What if your views depend on multiple models? You would have to invalidate cache in all of them.

You can see where this is going. This is tedious and error-prone.

Another approach might be to listen for signals from the database or from the models to invalidate the cache, this is what libraries like django-cacheops do.

Relying on signals from the database or from the internals of your ORM abstracts away the maintenance, which can be helpful, but requires either writing and testing a lot of glue code or adding a dependency on an external library.

In the next section, I'll show how we can do away with this imperative approach, by adopting the concept of dependency tracking.

Dependency tracking

If you've ever used React's useMemo hook, you should already be familiar with the concept of dependency tracking. The hook takes a list of dependencies, and whenever one of them changes, the memoized value is recomputed. If you haven't used useMemo before, it's a hook that lets you cache the result of a calculation between re-renders. In code:

const memoizedValue = useMemo(() => {
  return expensiveCalculation(dependency1, dependency2);
}, [dependency1, dependency2]);

The second argument is a list of dependencies. If any of them changes, the function is called again, and the new result is cached. This is a very powerful pattern, as it frees you from having to manually keep track of when to invalidate the cache.

Let's see how this pattern can be applied to response caching.

First, let's refactor the psuedo-code that fetches cached responses in views so it can be reused. In Python, this could be refactored as a decorator, but I don't want this to be Python specific so I leave that as an exercise for the reader.

def cache_view(view_name, view_func):
    cached_entry = cache.get(view_name)
    if cached_entry:
        return cached_entry

    response = view_func()
    cache.set(view_name, response)
    return response


def products_view():
    def view_func():
        return Products.all()

    return cache_view("products_view", view_func)

At this point, let's introduce a dependency array to tie the cached entries to the data changes they depend on.

def cache_view(view_name, view_func):
def cache_view(view_name, view_func, dependencies):
    cached_entry = cache.get(view_name)
    if cached_entry:
        return cached_entry

    response = view_func()
    cache.set(view_name, response)
    return response


def products_view():
    def view_func():
        return Product.all()

    return cache_view("products_view", view_func)
    return cache_view("products_view", view_func, ["products_updated"])

You'll have noticed that we're not doing anything with the dependencies yet, but we've set the scene and now I'm going to pull the proverbial rabbit out of the hat.

We're going to store the versions of the dependencies together with the cache entry so that, whenever we fetch a cached entry, we can compare the versions stored with the current versions. The dependency "version" can be a timestamp, a hash, or a counter to be increased when the version changes, it doesn't really matter.

Here's a diagram to illustrate the concept:

Finally, extending the pseudo-code example from above:

class Product:
    def save(self):
        db.commit()
        cache.delete("products_view")
        cache.set("products_updated", now())

def cache_view(view_name, view_func, dependencies):
    dep_versions = cache.get_all(dependencies)
    cached_entry = cache.get(view_name)
    if cached_entry:
    if cached_entry and cached_entry.dep_versions == dep_versions:
        return cached_entry

    response = view_func()
    cache.set(view_name, response)
    cache.set(view_name, response, dep_versions)
    return response


def products_view():
    def view_func():
        return Products.all()

    return cache_view("products_view", view_func, ["products_updated"])

While we still have to update the version in Product.save, with this simple setup the products view will always be up to date, as cached entries are automatically discarded whenever a product is updated.

Sugar on top

In my current job at FLYR, we've implemented this mechanism for cache invalidation a few months ago in our Python backend and with some helper functions it's also very easy to keep the cache layer completely separated from the rest of the app so that our business logic is not intertwined with anything related to cache.

I'm not going to go through the entire implementation at this stage, so this is more of a slideshow of what our codebase looks like, rather than a tutorial. You can fill in the blanks.

For example, this is what caching a view and updating a cache dependency can look like in Python using decorators.

# this view will be cached until the dependency
# version changes in a pipeline or other event
@cache_view(ttl=3600, dependencies=[EventsEnum.DataPipeline])
def some_view(request):
    ...


# if the function doesn't raise exceptions, it will update
# the dependency version at the end
@cache_dependency(EventsEnum.DataPipeline)
def some_data_pipeline():
    ...

This approach makes it easy to search the codebase for anything related to cache and makes it easy to add or remove the caching layer.