“Russian doll Caching” gained some popularity recently, I suspect in part due to its catchy (or cachie?) name and how easy it is to visualize the concept. Rails 4 should have this improved caching available by default. With Rails 3 you need to install the cache-digests gem. It’s pretty easy to get started with it, and the documentation is clear. It makes a lot of sense to start using it in your Rails app. I won’t attempt to cover the basics and will assume you are already familiar with it. I want to talk about a specific aspect of fragment caching surrounding the generation of the cache keys.
Cache keys are provided to the
cache method/block to allow rails to find out whether this fragment is already cached, or needs to be generated. You can pass any number of objects or strings, and rails will use them together to form a cache key. This works great in most partials or views and once you understand the basics it’s very easy to use.
There was one aspect however that I wasn’t particularly happy with: When using a collection of items, lets say a list of articles on a page, rails will need a cache-key for the entire collection. There are a few ways to solve this:
- Define an Active Record collection object (e.g.
ArticleList), which is only used to check the
updated_attimestamp. The contained objects (
Article) will have to
touchthe collection object on each Create/Update/Delete. This feels like a bit of an overkill to me.
- Provide the entire collection as the cache key (e.g.
Article.all). Rails will then cycle through each object in the collection, generate a cache key for each, and then concatenate the cache keys. This produces a potentally huge cache key, and might be quite an intensive operation on big collections.
- Don’t cache the entire collection, and instead cache each element inside the collection. This feels like some kind of a compromise, but not a great one. Instead of caching a whole section, we only cache parts of it. There are other related problems with fetching a large number of items from cache. (see below for a good solution for this use-case however)
- Build your own cache key for the collection. My immediate thought was to produce an md5 hash over the
updated_atfield of each item.
Matryoshka (MD5) cache keys
The last option does produce a shorter cache key, but does not really improve performance. If we have to load each and every
updated_at field for each object in the collection, concatenate it and then md5 hash it, this is not much better than letting rails do it for us. However, what if we ask our database to do the md5 hash for us?
It turns out that both PostgreSQL and MySQL have a built-in md5 hash function. SQLite does not however (although there might be some plugin or extension that does this, I haven’t explored). I didn’t look into NoSQL solutions either. So sadly this is not database-agnostic, but it’s simple enough to be worth doing. To generate my cache key for the collection, all I have to do is this:
# for PostgreSQL def article_collection_cache_key Article.connection.select_value('select md5(ARRAY_AGG(updated_at)::text) from articles') end # or for MySQL (Note: see post comments below on max length of group_concat!) def article_collection_cache_key Article.connection.select_value('select md5(group_concat(updated_at)) from articles') end
Doing a quick test using PostgreSQL, the query time on a reasonably large collection with md5 was fairly consistent at around 2.5ms, whereas it took around 5ms for the query returning the entire data set. That doesn’t seem like a lot, but the view rendering time had a much bigger impact overall, with results ranging around 30ms for the md5 version, and 90ms with the non-md5. Running another test using ApacheBench (500 requests, 1 concurrent connection) showed that 95% of requests completed within 272ms for the non-md5 version, but 154ms on the version using md5. This page was relatively ‘heavy’ anyway, with only one portion containing the cached fragment, but this was the only change between tests. The collection for the test was pretty large. I imagine a typical collection should be smaller to fit on a single page, and then perhaps the performance gains will be smaller too.
Nathan Kontny on faster rails partial caching described a different approach that solves slightly different challenges. Nathan’s solution may also be valuable for caching many smaller fragments, and then fetching them from cache more effectively. My solution was addressing a slightly different (and simpler) problem – concerning with the cache key generation, not the fetching process from cache.
I personally also try to use page caching as much as possible. Page caching delivers far superior performance, but it’s much harder to apply to dynamic pages. I hope to cover page caching ideas and tweaks in more detail on a future post.