Besides our talk, my most valuable experience at RailsConf was talking with Tobi about the new caching strategy (‘Tobi caching’) he’s using at Shopify. There are a few parts which all work together nicely.
Etags Matter
As Joe explained, using good etags, can substantially reduce your bandwidth bill. In his case it was a 70% reduction. The take away from this is that you need to think about how you’re going to generate opaque cache coherency values for your actions. For a good intro to HTTP conditional gets, go read this tutorial by charles.
Expiry is a Pain
Anyone who’s had to write sweepers for for an application with heavy caching knows how frustrating it can be. After all, cache invalidation is one of the two hard things in computer science. If you could somehow avoid expiring all the ‘stuff’ you’re caching, your life would be much much easier.
Memcache is Smart
Memcache and the Memcache client libraries have plenty of smarts built into them, despite being ‘dumb by design’. The client libraries use clever hashing to know which server to talk to, this lets you run a cluster of caches without worrying too much about which keys live on which server.
The server also has its own smarts about what keys are important. When it needs the memory memcached will drop the least recently used values, thereby ensuring that your unused keys won’t be ‘wasting space’.
Mix it all together
So with that in mind, what can we do to improve our application’s performance, and simplify our application.
Forget about expiry
As mentioned before, expiry is a complete pain in the ass. So let’s not do it. The key to getting away with this is to pick a key which completely encapsulates the resource you’re caching, and also ensures that if anything relevant changes, the key changes. Take the case of this blog post, a simple key would be the permalink, however if we used that, we’d need to expire the cache every time someone commented, or I corrected a typo.
The no-expiry alternative would be for mephisto to keep a ‘version number’ associated with each post and increment it every time someone commented, or the post body changed. Once it was doing that, we could construct a key that looked like www.koziarski.net:clever-caching:#{version_number}. Every time the version number changed, we’d get a cache miss, and regenerate the content, but subsequent requests will be served out of memcache. No more expiry!
Now that we’ve saved all that CPU time, we should see if there’s a way we can save some bandwidth too.
Embrace Etags
Thankfully, our cache key has all the properties of an ETag, whenever something important changes, our cache key does. So lets use that as a basis of building our ETag by using the MD5 hash. The only reason I don’t advocate using the cache key itself, is that you may want to include sensitive data in the key. Now we can just chuck d444415a8228fbed44cfa7ef39f15d8b into the ETag header, and compare our key with the value of ‘If-None-Match’ from the request headers.
Conclusion
By doing this you get the bandwidth savings of HTTP caching, the performance boost of action caching, but without the difficult expiry code. You can avoid all the NFS related headaches of page caching, but still get most of the performance boost.
While the approach won’t suit every project, it could well suit yours. Finally, a snippet of sorts for those of you who think in code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
around_filter :cache_sensibly, :only=>:show def cache_sensibly # compose the key using something we know matches our business cache_response(request.host, request.request_uri, @blog.version, @post.version) { yield } end private def cache_response(*keys) key = keys * ':' # use the hash as an etag so we can cache on # private data etag = MD5.hexdigest(key) # first handle HTTP, lets us avoid a memcache hit # and saves a huge amount of bandwidth to the client if request.env["HTTP_IF_NONE_MATCH"] == etag headers["X-Cache"] = "HTTP" head :not_modified return end response.headers["ETag"] = etag # Next check memcache if data = Cache.get(key) # render from the cached values headers["Content-Type"] = data[:content_type] headers["X-Cache"] = "HIT" render :text=>data[:content], :status=>data[:status] else # Finally, yield, indicate we've missed then cache the response headers["X-Cache"] = "MISS" yield Cache.put(key, {:content=>response.body, :status=>headers["Status"].to_i, :content_type=>(response.content_type || "text/html")}) end end |