Caching Application Data: Choosing the Right Approach for Glispa

Glispa Tech at Heart

This piece is the second post in our new Tech at Heart series, where we look at topics surrounding our technology. Check out our first post which highlights the 5 lessons we learned while scaling Glispa's technical architecture.

At Glispa Global Group, our highly-complex proprietary technologies need constant, reliable access to data. Our globally distributed DSP servers, for example, must process thousands of requests per second, 24 hours a day, responding to each of them in less than 10ms (the standard we have set for ourselves).

Our DSP servers use HTTP protocol and, among other data, need to know about campaigns and creatives in order to process requests. For each ad request, the system needs to look at thousands of campaigns. The actual memory size of this dataset is rather small, just a couple of gigabytes, and does not change very often. This means that, when considering our performance requirements at Glispa, we quickly came to the conclusion that we would significantly benefit from a caching approach, in which data is stored temporarily, so that requests for that data can be served faster. We needed to decide which was the best way to set up such a system based on our requirements, as there are a number of different ways to approach this, depending on use cases. Below are three of the caching approaches we considered at Glispa.

Lazy loading

The most straightforward approach to caching in applications, and probably the most widely used, is to retrieve data once from the original storage and cache it to save the time-consuming operation of retrieving it again. Data is often stored in in-memory storage (the fastest option) or on disk - often in special caching systems such as Memcached and Redis.

This solution is easy to implement, but it has several drawbacks for certain use cases. First, the access time to the data is non-deterministic, it will vary a lot. Requests for data that is not yet cached will be slow. Cleaning old data could also be challenging. A time-to-live (TTL) approach, in which the amount of time during which the cached data item is expected to be valid is considered, could be hard to set up. If the selected TTLs are too short, you will experience a performance impact as they will be removed from the cache too early. Set them too high and you may serve stale campaigns since the cache has not been refreshed often enough. Invalidation approaches, in which data which has been changed gets invalidated in the cache, meaning it needs to be fetched from storage again, also impact performance. Invalidating a part of your data can cause a query peak on the data storage and will increase the response time. Of course, the cache will also need to be able to handle concurrency, both storing and retrieving data from the cache at the same time.

Caching Flow Chart

Eager loading with a single buffer

Another option is to use eager loading with a single buffer, particularly if the goal is for servers to have deterministic access times. One approach to this is background eager synchronization.

In this approach all the data is loaded into the cache. Afterwards, a background synchronization process keeps it up to date. The data is always in the memory, so access time is constant and the only delay is the synchronization period. With an incremental synchronization process this period can be small, just few seconds. Still, one may encounter issues regarding concurrency. Every data class must handle the fact that it could be written (by the background sync thread) at the same time it is read (by a request handling thread). This means that adapted collections and a locking mechanism need to be used.

Eager loading with double buffers

This solution is similar to the previous one, in which all the data is loaded into the memory. The difference here is that two different in-memory caches are used, one for reading and one for writing. Handling requests only use the read buffer. It is immutable - cannot be changed once it’s created - so it guarantees that we will not have consistency issues or need to synchronize concurrent accesses. The data is synchronized with the storage using the write buffer. Once the write buffer has been updated it’s flipped with the read buffer. With this solution, the data is constantly and quickly accessible, no concurrency is needed, and there is no consistency issue. On the other hand, twice as much memory is necessary.

Eager Loading with Double Buffers - Flow Chart

Consider the use case

Caching is possible in various forms. In order to choose the correct strategy, it’s important to consider the specifications of your data (total size, required update frequency and runtime usage). We learned which strategy to choose through experimentation: don’t be afraid to try incremental approaches and monitor your response time (average and percentiles) in order to find a solution that best fits your requirements.

There is also no need to restrict yourself to a single strategy, an application could use different approaches for different data. At Glispa, we make extensive use of both lazy loading and double-buffer eager loading. We have an implementation for the double-buffer that we use across multiple projects like the DSP servers, while we also use Guava for lazy loading a different dataset.

Data requirements differ hugely from company to company. Once we found that we needed a caching solution, we ran a couple of experiments and did a lot of research until we found the best solution for us was two different approaches. The key is having an excellent understanding of your goals, and investing to evaluate the options.


About the Author:

Vincent Maurin, Java Chapter Lead

Vincent Maurin is our Java Chapter Lead at Glispa Global Group.