0014. Application Caching Strategy

Status: Accepted Date: 2025-01-27 Context: Application-level caching to improve performance and cut load and cost across single-instance dev and multi-instance production.

Context

The Forge platform performs several expensive operations on every request or frequently accessed data:

Cognito Token Validation - JWT parsing and signature validation via Cognito JWKS on every authenticated request
User Profile Lookups - PostgreSQL queries for candidate profiles on profile page loads and header username display
Document Retrieval - DynamoDB reads for parsed documents

Current state shows no caching layer, resulting in:

High database query volume (PostgreSQL and DynamoDB)
Repeated Cognito JWKS lookups and JWT parsing overhead
Increased latency for frequently accessed data
Higher operational costs (DynamoDB read capacity units, database connection overhead)

We must choose between:

No caching - Continue with direct database/external service calls
Local caching only - In-memory cache (Caffeine) for single-instance deployments
Distributed caching - Redis/ElastiCache for multi-instance deployments
Hybrid approach - Start with local caching, migrate to distributed for production

Additionally, we need to consider integration with planned distributed rate limiting infrastructure (Redis backend).

Decision

Implement a three-phase caching strategy using Quarkus Cache:

Phase 1: Local Caching (Caffeine) - Use Quarkus Cache with default Caffeine backend for development and single-instance deployments
Phase 2: Metrics and Monitoring - Add cache metrics (hit/miss rates) to Grafana dashboards before production migration
Phase 3: Redis Backend (Production) - Migrate to quarkus-cache-redis with AWS ElastiCache for distributed caching across multiple service instances

Primary Use Cases:

Cache Cognito token validation results (TTL = token expiration)
Cache candidate profile data (TTL = 10 minutes, invalidate on updates)
Cache parsed document results (TTL = 1 hour, documents are idempotent)

Cache Implementation:

Use Quarkus Cache annotations (@CacheResult, @CacheInvalidate)
Cache-aside pattern (invalidate on writes, populate on reads)
Fail-open strategy (cache failures don't break requests)

Infrastructure:

Phase 1: Local Caffeine cache (in-memory, per-instance)
Phase 3: Shared ElastiCache Redis cluster for caching + distributed rate limiting

Rationale

Why Quarkus Cache

Framework Integration - Native Quarkus support with CDI annotations
Multiple Backends - Supports Caffeine (local) and Redis (distributed) via configuration
Simple API - @CacheResult and @CacheInvalidate annotations reduce boilerplate
Metrics - Automatic Micrometer integration for cache metrics
Production Ready - Battle-tested in Quarkus ecosystem

Why Three-Phase Approach

Incremental Risk - Start simple (local cache), add metrics, then migrate to distributed
Development First - Local caching works immediately without infrastructure dependencies
Metrics Before Production - Understand cache performance before distributed migration
Production Hardening - Redis backend provides consistency across instances

Why Cache-Aside Pattern

Simplicity - Clear separation: reads populate cache, writes invalidate cache
Consistency - Database remains source of truth
Flexibility - Easy to add/remove caching without changing business logic

Why Shared Redis Infrastructure

Cost Optimization - Single ElastiCache cluster serves caching + rate limiting
Operational Simplicity - One infrastructure component to manage
Synergy - Future distributed rate limiting (tracked on the private product roadmap in the main repository)

Why These Use Cases

Token Validation - Highest frequency (every authenticated request), high cryptographic overhead
Candidate Profiles - Frequently accessed, low change rate, database query overhead
Parsed Documents - Idempotent (safe to cache), DynamoDB read cost reduction

Architecture Overview

Phase 1: Local Caching (Caffeine)

┌─────────────────┐
│  Service        │
│  Instance       │
│                 │
│  ┌───────────┐  │
│  │ Caffeine  │  │
│  │ Cache     │  │
│  └───────────┘  │
│       │         │
└───────┼─────────┘
        │
        ├──> PostgreSQL (candidate profiles)
        ├──> DynamoDB (parsed documents)
        └──> Cognito (token validation)

Phase 3: Distributed Caching (Redis)

┌─────────────────┐     ┌─────────────────┐
│  Service        │     │  Service        │
│  Instance 1     │     │  Instance 2     │
│                 │     │                 │
│  ┌───────────┐  │     │  ┌───────────┐  │
│  │ Quarkus   │  │     │  │ Quarkus   │  │
│  │ Cache     │  │     │  │ Cache     │  │
│  └─────┬─────┘  │     │  └─────┬─────┘  │
└────────┼────────┘     └────────┼────────┘
         │                       │
         └────────────┬──────────┘
                      │
              ┌───────▼───────┐
              │ ElastiCache   │
              │ Redis Cluster │
              │               │
              │ cache:*       │
              │ ratelimit:*   │
              └───────┬───────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
   PostgreSQL    DynamoDB       Cognito

Implementation Details

Cache Configuration

Token Validation Cache:

Cache name: token-validation
Key: Composite key: hashed token reference plus expiry (so entries line up with JWT validity windows; never use the raw bearer token as all or part of the key).
TTL: Token expiration time (automatic)
Max size: 10,000 entries

Candidate Profile Cache:

Cache name: candidate-profiles
Key: Composite key: profile namespace plus stable candidate identifier
TTL: 10 minutes
Max size: 5,000 entries
Invalidation: On profile registration/updates

Document Cache:

Cache name: parsed-resumes, parsed-jobspecs
Key: Composite key: document-type namespace plus candidate identifier or job-spec transaction identifier, depending on which artifact is cached
TTL: 1 hour
Max size: 10,000 entries
Invalidation: On document re-upload

Error Handling

Fail-Open Strategy - Cache failures don't break requests
Negative Caching - Don't cache null/empty results (except token validation)
Exception Handling - Quarkus Cache handles exceptions gracefully, falls back to underlying operation

Security Considerations

Token Caching - Cache keys use token hashes (not full tokens)
PII in Cache - Ensure Redis encryption at rest and in transit
Key Namespacing - Use prefixes (cache:token:, cache:candidate:) to prevent collisions

Consequences

Positive

Performance - Reduced latency for cached operations (20-50ms improvement)
Cost Reduction - 20-30% reduction in DynamoDB read capacity, 30-40% reduction in PostgreSQL queries
Scalability - Redis backend enables horizontal scaling with shared cache
Operational Efficiency - Shared Redis infrastructure for caching + rate limiting
Developer Experience - Simple annotations, no boilerplate cache management code

Negative

Complexity - Additional infrastructure component (Redis) in production
Cache Invalidation - Must ensure cache invalidation on writes (cache-aside pattern)
Memory Usage - Local cache (Caffeine) consumes JVM heap memory
Cache Stampede - Risk of thundering herd if cache expires simultaneously (mitigated by TTL variance)

Risks and Mitigations

Cache Inconsistency - Mitigated by cache-aside pattern (database is source of truth)
Cache Failures - Mitigated by fail-open strategy (requests continue without cache)
Memory Pressure - Mitigated by size limits and TTL-based eviction
Redis Availability - Mitigated by ElastiCache high availability configuration

Success Metrics

Cache hit rate > 80% for token validation
Cache hit rate > 60% for candidate profiles
Cache hit rate > 70% for documents
P95 latency reduction of 20-50ms for cached operations
20-30% reduction in DynamoDB read capacity units
30-40% reduction in PostgreSQL query volume