Caching

Posted Apr 14, 2023

By Mateusz Deska

13 min read

Caching

Your product page takes 800ms to load. Most of that time is spent querying the same database rows, computing the same price calculations, and fetching the same product images that were served five seconds ago to a different user. The data has not changed, but your system re-derives it on every single request. Caching exists to eliminate this kind of waste.

At its core, caching means storing data in a location that is faster to access than the original source. Different operations have vastly different latency profiles: reading from L1 CPU cache takes about 0.5 nanoseconds, reading from memory takes around 100 nanoseconds, and a round trip within the same data center takes roughly 500 microseconds. A database query that crosses the network, hits disk, and returns results might take 5-50 milliseconds. Caching lets your system avoid the slow path by keeping frequently accessed data closer to where it is needed.

Caching Levels

Caching can be implemented at various points in a system, and understanding these levels helps you decide where to place caches for the most impact:

Client-level caching: Browsers cache static assets, API responses, and DNS lookups. When data is cached at the client, the request never even reaches your server, eliminating network latency entirely.
Server-level caching: The client still talks to the server, but the server does not always need to query the database. An in-memory cache on the server can return previously computed results in microseconds instead of waiting for a database round trip.
Intermediate caching: A dedicated cache layer sits between components, such as between application servers and the database. This is the most common pattern in distributed systems because it decouples the cache lifecycle from both the application and the data store.

Types of Caching: In-memory vs External

In-memory caching stores data within the same operating system process as the consuming application. This approach provides low-latency access and is ideal for high-speed data access and smaller cache sizes.

External caching manages the cache in a separate operating system process, either locally or remotely. Solutions like Redis or Memcached offer more scalability and flexibility, making this approach suitable for larger cache sizes, distributed systems, or shared caching across multiple application instances.

Content Delivery Network

A Content Delivery Network (CDN) is essentially caching applied at the network edge. If your servers are located in Frankfurt but a user in Tokyo makes a request, every single response must cross the entire distance. A CDN solves this by distributing cached copies of your content to servers around the world, commonly called PoPs (Points of Presence).

CDNs work best for static assets (images, CSS, JavaScript) and cacheable API responses. When a user in Tokyo requests your product image, the nearest PoP serves it directly instead of forwarding the request to Frankfurt. This is the same caching principle applied to the client-level and server-level scenarios above, just operated by a third-party service at global scale. Some popular CDNs include Cloudflare and Google Cloud CDN.

The key trade-off with CDNs is cache invalidation. When you deploy a new version of your CSS, you need stale copies across dozens of PoPs to be refreshed. Most CDNs handle this through a combination of TTLs and explicit purge APIs, which ties directly into the invalidation strategies discussed below.

When May Caching Be Helpful?

Speeding up network operations and avoiding redundancy: If the same database query runs thousands of times per hour and the underlying data changes once a day, caching the result eliminates nearly all of those database round trips.
Accelerating computationally expensive operations: When a server performs a heavy computation in response to a request (aggregating reports, computing recommendations), caching the result prevents the system from repeating the same work on every request.
Managing high-frequency operations: Even when individual requests are fast, sheer volume can overwhelm the data source. Caching distributes the load by serving repeated reads from a faster layer, reducing the overall stress on the database.

Cache Invalidation

Cache invalidation is a crucial aspect of caching, as it ensures that the data served from the cache remains accurate and up-to-date. In scenarios where data is stored in multiple locations, such as a database and a server-side cache, maintaining consistency between these two sources of truth is essential.

Consider an example where you store posts in a database and also cache them on the server-side. In this case, both the database and the cache contain the same information, creating two sources of truth. The challenge lies in keeping these sources in sync when the data changes.

When a post is updated or deleted, the cached version needs to be invalidated or refreshed to maintain consistency between the cache and the database. There are several cache invalidation strategies to tackle this issue:

Write-through: Write data to both the cache and the primary storage simultaneously on every write operation. This approach ensures that the cache is always up-to-date, but it may lead to increased latency for write operations due to waiting for both cache and primary storage updates.
- Advantages: Fast retrieval, complete data consistency, robust to system disruptions
- Disadvantages: Higher latency for write operations.
Write-back: Write data to the cache first and asynchronously update the primary data storage later. This strategy allows for faster write operations by reducing wait times associated with primary storage updates. However, it may introduce the risk of data loss or inconsistency in case of a system failure before the data is synchronized with the primary storage.
- Advantages: Faster write operations (by reducing wait times associated with primary storage updates), improved write performance, and reduced load on primary storage.
- Disadvantages: Risk of data loss or inconsistency in case of a system failure before synchronization, potential stale data in primary storage, and added complexity for managing data synchronization.
Write-around: Write data directly to the primary storage, bypassing the cache entirely. The corresponding cache entry (if one exists) is invalidated so that the next read triggers a fresh fetch from the database. This strategy is particularly useful for write-heavy workloads where most written data is not read back immediately. Think of a logging system: you write millions of log entries, but only a small fraction are ever queried. Caching every log entry on write would be wasteful and could evict more valuable data from the cache.
- Advantages: Prevents cache pollution from infrequently read data, simpler write path.
- Disadvantages: Higher read latency on the first access after a write (guaranteed cache miss).
Cache-aside (Lazy Loading): The application itself manages the cache rather than relying on the cache layer to intercept reads or writes automatically. On a read, the application first checks the cache. If the data is present (cache hit), it is returned directly. If the data is absent (cache miss), the application reads from the primary storage, stores the result in the cache, and then returns it. On a write, the application updates the primary storage and either invalidates or updates the corresponding cache entry.
- Advantages: Only requested data is cached (no unnecessary cache population), straightforward to implement, and works well with any data store.
- Disadvantages: Initial requests always result in cache misses (cold start), and the application must handle the cache-management logic explicitly.

By implementing an appropriate cache invalidation strategy, you can maintain data consistency between the cache and the primary data source, ensuring that your system serves accurate and up-to-date information.

Cache-Aside in Practice

Cache-aside is probably the most common caching pattern in web applications, so it is worth seeing in code. Here is a typical implementation using Redis as the external cache:

  
public class ProductCacheService {

    private final JedisPool jedisPool;
    private final ProductRepository productRepository;
    private final ObjectMapper objectMapper;

    private static final int CACHE_TTL_SECONDS = 3600;

    public ProductCacheService(JedisPool jedisPool,
                               ProductRepository productRepository,
                               ObjectMapper objectMapper) {
        this.jedisPool = jedisPool;
        this.productRepository = productRepository;
        this.objectMapper = objectMapper;
    }

    public Product getProduct(String productId) {
        String cacheKey = "product:" + productId;

        try (Jedis jedis = jedisPool.getResource()) {
            // Step 1: Check the cache
            String cached = jedis.get(cacheKey);
            if (cached != null) {
                return objectMapper.readValue(cached, Product.class);
            }

            // Step 2: Cache miss, read from the database
            Product product = productRepository.findById(productId)
                    .orElseThrow(() -> new ProductNotFoundException(productId));

            // Step 3: Populate the cache for future reads
            jedis.setex(cacheKey, CACHE_TTL_SECONDS,
                    objectMapper.writeValueAsString(product));

            return product;
        } catch (JsonProcessingException e) {
            throw new CacheSerializationException("Failed to serialize product", e);
        }
    }

    public void updateProduct(Product product) {
        // Write to the primary storage first
        productRepository.save(product);

        // Invalidate the cache entry so the next read fetches fresh data
        String cacheKey = "product:" + product.getId();
        try (Jedis jedis = jedisPool.getResource()) {
            jedis.del(cacheKey);
        }
    }
}

The pattern is simple: read from cache first, fall back to the database on a miss, and populate the cache on the way back. On writes, the safest approach is to invalidate (delete) the cache entry rather than trying to update it, because deleting avoids race conditions where a concurrent read could re-cache stale data between the database write and the cache update.

Thundering Herd

One subtle problem with cache-aside is the thundering herd (also called cache stampede). Imagine a popular product page whose cache entry expires. In the few milliseconds before one request can repopulate the cache, hundreds of concurrent requests all experience a cache miss and all hit the database simultaneously. The database, which was comfortably handling the load thanks to caching, suddenly receives a massive spike.

This can cascade: the database slows down, requests time out, and users retry, making things worse. I have seen this happen in production when a TTL expired on a heavily accessed key during peak traffic.

There are several strategies to mitigate the thundering herd:

Locking (mutex-based population): When a cache miss occurs, only one request acquires a lock and fetches from the database. All other requests wait for the lock to be released (or return a slightly stale value). In Redis, you can implement this with SET key value NX EX timeout as a distributed lock.
Probabilistic early expiration: Instead of letting all copies expire at exactly the same moment, each request recomputes the cache slightly before it expires. The idea is that one request will refresh the cache proactively, so the key never actually expires under load. This is sometimes called “stale-while-revalidate.”
Background refresh: A separate background thread or scheduled job refreshes cache entries before they expire. The application always reads from the cache and never directly triggers a database query on a miss, because the cache is kept warm externally.

The right choice depends on your tolerance for stale data. Locking gives the freshest results but adds latency while requests wait. Probabilistic early expiration and background refresh trade a small window of staleness for much smoother load patterns.

Cache Warming

Cache-aside has one inherent weakness: after a deployment, a restart, or a Redis failover, the cache starts empty. Every request results in a cache miss, and the database absorbs the full load until the cache fills up organically. For high-traffic systems, this cold start period can be just as dangerous as the thundering herd.

Cache warming is the practice of preloading the cache with data before traffic hits it. There are several approaches:

Startup preloading: When the application starts, it queries the database for the most frequently accessed records and populates the cache before accepting traffic. This works well when you know your hot keys in advance (e.g., the top 1,000 products, configuration entries, or feature flags).
Warm-up from access logs: Analyze recent access logs or query patterns to determine which keys were most frequently requested, and preload those. This is useful after planned maintenance windows where you can predict what users will request.
Shadow traffic: Route a copy of production read traffic to the new instance before it starts serving real responses. The instance populates its cache from real access patterns without any risk to users.
Replication from peer caches: If you run multiple cache nodes, a new node can copy data from an existing peer rather than rebuilding from the database. Redis Cluster handles this automatically during node failover.

The goal in all cases is the same: avoid the cold start penalty by ensuring that the cache already contains the data your system needs before real traffic arrives.

Time-to-Live (TTL)

Regardless of which caching strategy you choose, setting a Time-to-Live (TTL) on cached entries is an essential practice. A TTL defines how long a cached value remains valid before it is automatically expired and evicted.

Setting the right TTL requires balancing data freshness against performance. A short TTL (e.g., a few seconds) ensures that stale data is served only briefly, which is important for rapidly changing data like stock prices or session tokens. A longer TTL (e.g., minutes or hours) reduces the load on the primary data store and improves cache hit rates, which works well for data that changes infrequently, such as product catalog information or user profile images.

In practice, many systems use a combination of TTLs across different data types. TTLs also serve as a safety net: even if an explicit invalidation is missed due to a bug or a race condition, the stale entry will eventually expire on its own.

Cache Eviction Policy

A cache eviction policy determines how values are removed from a cache when it reaches capacity. Eviction is necessary to maintain cache efficiency, free up space for new data, and ensure that the most relevant information is stored. The most commonly used eviction policies are:

LRU (Least Recently Used): Evicts the entry that has not been accessed for the longest time. LRU works well for workloads where recently accessed data is likely to be accessed again soon. It is the default eviction policy in many caching systems, including Redis and Memcached.
LFU (Least Frequently Used): Evicts the entry with the fewest total accesses. LFU favors data that is accessed consistently over time, even if it was not accessed very recently. It works well when some items are “popular” and should remain cached regardless of short gaps between accesses. The downside is that entries with historically high access counts can become stale yet remain cached for a long time.
FIFO (First In, First Out): Evicts the oldest entry in the cache, regardless of how recently or frequently it was accessed. FIFO is the simplest policy to implement and has predictable behavior, but it does not consider access patterns at all. It can be a reasonable choice when all cached items have roughly equal likelihood of being accessed.

Each policy offers a different approach to prioritizing which data should be evicted. Choosing the right one depends on the access patterns of your application. In many cases, LRU provides a good default balance, but workloads with distinct hot and cold data may benefit from LFU, while simple time-bounded caches may work well with FIFO.

Related posts: Spring Cache, Latency And Throughput

System Design

This post is licensed under CC BY 4.0 by the author.