back to home

the art of clean code

MARCH 20, 2025 PERFORMANCE

At Monitic, we recently faced a challenge that many growing applications encounter: as our user base expanded, our database queries became increasingly slow, particularly for complex operations that required joining multiple tables or aggregating large datasets. To address this performance bottleneck, we implemented a caching strategy using Redis, which dramatically improved response times and reduced database load.

In this article, I'll share our journey implementing Redis caching, the challenges we faced, the patterns we used, and the lessons we learned along the way.

Why Redis?

Before diving into the implementation details, it's worth explaining why we chose Redis for our caching solution. We evaluated several options, including:

  • In-memory caching (application-level caching)
  • Memcached
  • Redis
  • Database query caching

We ultimately selected Redis for several compelling reasons:

  • Performance: Redis is extremely fast with sub-millisecond response times
  • Versatility: Redis supports various data structures (strings, hashes, lists, sets, sorted sets) which gives us flexibility in how we structure our cached data
  • Persistence: Redis can persist data to disk, which helps with recovery in case of restarts
  • Built-in features: Redis offers atomic operations, pub/sub messaging, Lua scripting, and transactions
  • Distributed caching: Redis supports cluster mode for horizontal scaling
  • TTL support: Redis allows setting expiration times for keys, making cache invalidation strategies easier to implement

Caching Patterns We Implemented

We implemented several caching patterns to address different use cases in our application:

1. Cache-Aside (Lazy Loading)

This is the most common caching pattern we used. The application first checks the cache for data; if it's a miss, it queries the database and updates the cache for future use.

func (s *Service) GetUser(ctx context.Context, id string) (*User, error) {
    // Try to get from cache first
    cacheKey := fmt.Sprintf("user:%s", id)
    cachedUser, err := s.redis.Get(ctx, cacheKey).Result()
    
    if err == nil {
        // Cache hit
        var user User
        if err := json.Unmarshal([]byte(cachedUser), &user); err == nil {
            return &user, nil
        }
    }
    
    // Cache miss, get from DB
    user, err := s.repo.GetUser(ctx, id)
    if err != nil {
        return nil, err
    }
    
    // Store in cache for future requests
    userJSON, _ := json.Marshal(user)
    s.redis.Set(ctx, cacheKey, userJSON, time.Minute*15)
    
    return user, nil
}

2. Write-Through

For frequently accessed data that changes often, we implemented a write-through cache, where the cache is updated whenever the database is updated.

func (s *Service) UpdateUser(ctx context.Context, user *User) error {
    // Update in DB first
    if err := s.repo.UpdateUser(ctx, user); err != nil {
        return err
    }
    
    // Then update the cache
    cacheKey := fmt.Sprintf("user:%s", user.ID)
    userJSON, _ := json.Marshal(user)
    return s.redis.Set(ctx, cacheKey, userJSON, time.Minute*15).Err()
}

3. Cache Prefetching

For predictable access patterns, we implemented cache prefetching to load related data into the cache before it's needed.

func (s *Service) GetUserWithPreferences(ctx context.Context, userID string) (*UserWithPreferences, error) {
    // Get the user
    user, err := s.GetUser(ctx, userID)
    if err != nil {
        return nil, err
    }
    
    // Prefetch user's friends in the background
    go func() {
        friendIDs, _ := s.repo.GetUserFriendIDs(context.Background(), userID)
        for _, friendID := range friendIDs {
            s.GetUser(context.Background(), friendID) // This will populate the cache
        }
    }()
    
    // Get user preferences
    prefs, err := s.GetUserPreferences(ctx, userID)
    if err != nil {
        return nil, err
    }
    
    return &UserWithPreferences{
        User: user,
        Preferences: prefs,
    }, nil
}

Tip: Background Prefetching

When implementing background prefetching, make sure to use a separate context to prevent cancellations from the original request from affecting the prefetching process. Also, consider using a worker pool to limit concurrent background tasks.

4. Bulk Loading

For batch operations, we used Redis's pipelining feature to reduce network round trips:

func (s *Service) GetUsersByIDs(ctx context.Context, userIDs []string) (map[string]*User, error) {
    result := make(map[string]*User)
    pipeline := s.redis.Pipeline()
    
    // Prepare all get commands
    getCmds := make(map[string]*redis.StringCmd)
    for _, id := range userIDs {
        cacheKey := fmt.Sprintf("user:%s", id)
        getCmds[id] = pipeline.Get(ctx, cacheKey)
    }
    
    // Execute pipeline
    _, err := pipeline.Exec(ctx)
    if err != nil && err != redis.Nil {
        // Handle error but continue to get missing users from DB
    }
    
    // Collect cache hits and identify misses
    var missedIDs []string
    for id, cmd := range getCmds {
        val, err := cmd.Result()
        if err == nil {
            // Cache hit
            var user User
            if err := json.Unmarshal([]byte(val), &user); err == nil {
                result[id] = &user
                continue
            }
        }
        // Cache miss
        missedIDs = append(missedIDs, id)
    }
    
    // Get missed users from DB
    if len(missedIDs) > 0 {
        users, err := s.repo.GetUsersByIDs(ctx, missedIDs)
        if err != nil {
            return result, err
        }
        
        // Update cache in pipeline
        pipeline = s.redis.Pipeline()
        for id, user := range users {
            result[id] = user
            userJSON, _ := json.Marshal(user)
            cacheKey := fmt.Sprintf("user:%s", id)
            pipeline.Set(ctx, cacheKey, userJSON, time.Minute*15)
        }
        
        // Execute pipeline
        _, err = pipeline.Exec(ctx)
        if err != nil {
            // Log error but don't fail the request
        }
    }
    
    return result, nil
}

Cache Invalidation Strategies

Cache invalidation is one of the hardest problems in computer science, and we implemented several strategies to ensure our cache stays fresh:

1. Time-Based Expiration (TTL)

The simplest strategy is to set a Time-To-Live (TTL) for each cache entry. We used different TTLs depending on the type of data:

  • User profiles: 15 minutes
  • Product catalog: 1 hour
  • Configuration settings: 5 minutes
  • Real-time metrics: 30 seconds

2. Event-Based Invalidation

For data that changes frequently, we implemented event-based invalidation using Redis pub/sub:

// When data is updated
func (s *Service) InvalidateUserCache(ctx context.Context, userID string) error {
    invalidationMessage := &CacheInvalidation{
        Type: "user",
        ID:   userID,
        Time: time.Now(),
    }
    
    messageJSON, _ := json.Marshal(invalidationMessage)
    return s.redis.Publish(ctx, "cache:invalidations", messageJSON).Err()
}

// In each service instance
func (s *Service) subscribeToInvalidations() {
    pubsub := s.redis.Subscribe(context.Background(), "cache:invalidations")
    defer pubsub.Close()
    
    channel := pubsub.Channel()
    for msg := range channel {
        var invalidation CacheInvalidation
        if err := json.Unmarshal([]byte(msg.Payload), &invalidation); err != nil {
            continue
        }
        
        if invalidation.Type == "user" {
            cacheKey := fmt.Sprintf("user:%s", invalidation.ID)
            s.redis.Del(context.Background(), cacheKey)
        }
    }
}

3. Version-Based Invalidation

For frequently accessed resources with complex invalidation logic, we implemented version-based caching:

func (s *Service) GetUserWithVersion(ctx context.Context, userID string) (*User, error) {
    // Get the current version
    versionKey := fmt.Sprintf("user:version:%s", userID)
    version, err := s.redis.Get(ctx, versionKey).Result()
    if err != nil && err != redis.Nil {
        // If we can't get the version, fall back to DB
        return s.repo.GetUser(ctx, userID)
    }
    
    // Try to get from cache with version
    cacheKey := fmt.Sprintf("user:%s:v:%s", userID, version)
    cachedUser, err := s.redis.Get(ctx, cacheKey).Result()
    if err == nil {
        // Cache hit
        var user User
        if err := json.Unmarshal([]byte(cachedUser), &user); err == nil {
            return &user, nil
        }
    }
    
    // Cache miss, get from DB
    user, err := s.repo.GetUser(ctx, userID)
    if err != nil {
        return nil, err
    }
    
    // Generate new version if needed
    if version == "" {
        version = uuid.NewString()
        s.redis.Set(ctx, versionKey, version, 0) // No expiration for version
    }
    
    // Store in cache with version
    userJSON, _ := json.Marshal(user)
    s.redis.Set(ctx, cacheKey, userJSON, time.Hour*24)
    
    return user, nil
}

func (s *Service) InvalidateUserVersionCache(ctx context.Context, userID string) error {
    // Simply update the version, old cache entries will not be used
    versionKey := fmt.Sprintf("user:version:%s", userID)
    return s.redis.Set(ctx, versionKey, uuid.NewString(), 0).Err()
}

Warning: Cache Stampede

When a popular cache item expires, multiple requests can simultaneously hit the database, causing a "cache stampede." We addressed this with the "cache lock" pattern, where the first request to encounter a cache miss acquires a lock, regenerates the cache, and releases the lock, while other requests wait briefly before retrying the cache.

Performance Improvements

After implementing Redis caching, we saw dramatic performance improvements:

Operation Before Caching (ms) After Caching (ms) Improvement
User Profile Lookup 250ms 5ms 98%
Product Listing (50 items) 850ms 35ms 96%
Dashboard Metrics 1200ms 80ms 93%
Search Results 900ms 120ms 87%

Not only did response times improve drastically, but our database load also decreased by approximately 70% during peak hours. This allowed us to scale our application to handle more users without upgrading our database infrastructure.

Monitoring and Maintenance

To ensure our caching system performs optimally, we implemented comprehensive monitoring:

  • Cache Hit Rate: We track the ratio of cache hits to total cache requests to ensure our caching strategy is effective (target: >90%)
  • Cache Size: We monitor the memory usage of Redis to prevent evictions
  • Cache Latency: We measure response times for Redis operations
  • Connection Pool: We track the number of active and idle connections to Redis
  • Error Rates: We monitor Redis errors to detect connectivity issues

We built a custom dashboard to visualize these metrics, which helps us identify potential issues before they affect users:

Lessons Learned

Implementing Redis caching taught us several valuable lessons:

1. Start with a Clear Caching Strategy

Before implementing caching, it's essential to understand what to cache, for how long, and how to invalidate it. We initially made the mistake of caching too much data without a clear invalidation strategy, leading to stale data issues.

2. Measure Before and After

Collect performance metrics before implementing caching to establish a baseline and measure improvements. This helps justify the engineering effort and identify the most impactful areas for optimization.

3. Cache Serialization Matters

We initially used JSON for serializing objects to cache, but later switched to Protocol Buffers for frequently accessed data, which reduced serialization/deserialization time and payload size.

4. Don't Cache Everything

Caching has overhead, both in terms of development complexity and operational cost. For rarely accessed data or simple queries, direct database access might be more efficient.

5. Plan for Redis Outages

Redis is highly reliable, but like any system, it can fail. We implemented circuit breakers to gracefully fall back to direct database access when Redis is unavailable.

func (s *Service) GetUserWithCircuitBreaker(ctx context.Context, id string) (*User, error) {
    if !s.redisCB.IsAllowed() {
        // Circuit is open, skip cache and go directly to DB
        return s.repo.GetUser(ctx, id)
    }
    
    // Try to get from cache
    cacheKey := fmt.Sprintf("user:%s", id)
    cachedUser, err := s.redis.Get(ctx, cacheKey).Result()
    
    if err != nil && err != redis.Nil {
        // Redis error occurred
        s.redisCB.RecordFailure()
        // Fall back to DB
        return s.repo.GetUser(ctx, id)
    }
    
    // Record successful Redis operation
    s.redisCB.RecordSuccess()
    
    if err == nil {
        // Cache hit
        var user User
        if err := json.Unmarshal([]byte(cachedUser), &user); err == nil {
            return &user, nil
        }
    }
    
    // Cache miss or unmarshal error, get from DB
    user, err := s.repo.GetUser(ctx, id)
    if err != nil {
        return nil, err
    }
    
    // Try to store in cache, but don't fail if Redis is down
    userJSON, _ := json.Marshal(user)
    _ = s.redis.Set(ctx, cacheKey, userJSON, time.Minute*15)
    
    return user, nil
}

Conclusion

Implementing Redis caching has been a game-changer for our application performance at Monitic. We've significantly reduced database load, improved response times, and enhanced the overall user experience.

While caching adds complexity to the system, the performance benefits are well worth it for high-traffic applications. The key is to approach caching strategically, measure its impact, and continuously refine your implementation based on real-world usage patterns.

If you're facing performance challenges in your application, I highly recommend exploring Redis as a caching solution. With its versatility, performance, and rich feature set, it can address a wide range of caching needs.