Categories
optimization Performance rails ruby Technology

Matryoshka Fragment Caching in Rails

“Russian doll Caching” gained some popularity recently, I suspect in part due to its catchy (or cachie?) name and how easy it is to visualize the concept. Rails 4 should have this improved caching available by default. With Rails 3 you need to install the cache-digests gem. It’s pretty easy to get started with it, and the documentation is clear. It makes a lot of sense to start using it in your Rails app. I won’t attempt to cover the basics and will assume you are already familiar with it. I want to talk about a specific aspect of fragment caching surrounding the generation of the cache keys.

Cache keys are provided to the cache method/block to allow rails to find out whether this fragment is already cached, or needs to be generated. You can pass any number of objects or strings, and rails will use them together to form a cache key. This works great in most partials or views and once you understand the basics it’s very easy to use.

There was one aspect however that I wasn’t particularly happy with: When using a collection of items, lets say a list of articles on a page, rails will need a cache-key for the entire collection. There are a few ways to solve this:

  • Define an Active Record collection object (e.g. ArticleList), which is only used to check the updated_at timestamp. The contained objects (Article) will have to touch the collection object on each Create/Update/Delete. This feels like a bit of an overkill to me.
  • Provide the entire collection as the cache key (e.g. Article.all). Rails will then cycle through each object in the collection, generate a cache key for each, and then concatenate the cache keys. This produces a potentally huge cache key, and might be quite an intensive operation on big collections.
  • Don’t cache the entire collection, and instead cache each element inside the collection. This feels like some kind of a compromise, but not a great one. Instead of caching a whole section, we only cache parts of it. There are other related problems with fetching a large number of items from cache. (see below for a good solution for this use-case however)
  • Build your own cache key for the collection. My immediate thought was to produce an md5 hash over the updated_at field of each item.

Matryoshka (MD5) cache keys

The last option does produce a shorter cache key, but does not really improve performance. If we have to load each and every updated_at field for each object in the collection, concatenate it and then md5 hash it, this is not much better than letting rails do it for us. However, what if we ask our database to do the md5 hash for us?

It turns out that both PostgreSQL and MySQL have a built-in md5 hash function. SQLite does not however (although there might be some plugin or extension that does this, I haven’t explored). I didn’t look into NoSQL solutions either. So sadly this is not database-agnostic, but it’s simple enough to be worth doing. To generate my cache key for the collection, all I have to do is this:

    # for PostgreSQL
    def article_collection_cache_key
      Article.connection.select_value('select md5(ARRAY_AGG(updated_at)::text) from articles')
    end

    # or for MySQL (Note: see post comments below on max length of group_concat!)
    def article_collection_cache_key
      Article.connection.select_value('select md5(group_concat(updated_at)) from articles')
    end
    

Doing a quick test using PostgreSQL, the query time on a reasonably large collection with md5 was fairly consistent at around 2.5ms, whereas it took around 5ms for the query returning the entire data set. That doesn’t seem like a lot, but the view rendering time had a much bigger impact overall, with results ranging around 30ms for the md5 version, and 90ms with the non-md5. Running another test using ApacheBench (500 requests, 1 concurrent connection) showed that 95% of requests completed within 272ms for the non-md5 version, but 154ms on the version using md5. This page was relatively ‘heavy’ anyway, with only one portion containing the cached fragment, but this was the only change between tests. The collection for the test was pretty large. I imagine a typical collection should be smaller to fit on a single page, and then perhaps the performance gains will be smaller too.

Other approaches

Nathan Kontny on faster rails partial caching described a different approach that solves slightly different challenges. Nathan’s solution may also be valuable for caching many smaller fragments, and then fetching them from cache more effectively. My solution was addressing a slightly different (and simpler) problem – concerning with the cache key generation, not the fetching process from cache.

I personally also try to use page caching as much as possible. Page caching delivers far superior performance, but it’s much harder to apply to dynamic pages. I hope to cover page caching ideas and tweaks in more detail on a future post.

6 replies on “Matryoshka Fragment Caching in Rails”

I think there’s a problem using group_concat: you’re limited to a 255 character or byte long string (I forget which or how long exactly…)

Very good point. You’re right. I wasn’t aware of this (I’m using PostgreSQL primarily). I think the max length is actually 1024, and not 255, but it’s still a potential problem. You can increased the max length however using the group_concat_max_len parameter in MySQL.

That’s an interesting approach Aaron. Thanks! Not sure if it’s *much* simpler, but it has a big advantage of being database-agnostic, which is great.

One (perhaps hypothetical) downside to this approach however, is that it relies on updated_at behaving in this way. If I wanted to base the cache key on actual values of a different column perhaps, and/or wasn’t sure that the updated_at is definitely touched correctly, then it wouldn’t work.

For most use-cases in Rails however, it does seem to work well and it’s a safe assumption. Any suggestions on how to make it work with Rails 3?

on the Member model, ensures that when a member is changed, we update the Team model as well. This is essential for using russian doll caching, as you must be able to break parent caches once children are modified.

The technique I use would detect that the child was updated, because it produces an md5 hash of all updated_at for all children. That’s the purpose of this method. It does this without having to create a parent ActiveRecord model however.

Leave a Reply

Your email address will not be published. Required fields are marked *