ECS Performance Testing Results (5× Run Analysis)
Overview
This document captures and analyses 5 repeated load test runs executed against the AWS ECS deployment running GraalVM native images.
Test Conditions (constant across all runs)
-
1 ECS task per service
-
CPU: 256 (0.25 vCPU)
-
Memory: 512 MiB
-
Runtime: GraalVM native images (Quarkus)
-
Load generator: EC2 instance in same VPC
-
External dependency: AWS Cognito (us-west-2)
-
Data stores: LocalStack (DynamoDB, Postgres, S3)
-
Test profile:
- 50 VUs warmup (3 min)
- 50 VUs load (3 min)
Key change vs earlier local tests
- Transition from localhost + LocalStack only → ECS + EC2 + mixed AWS dependency model
- Native image execution enabled
- More realistic network topology and latency profile
Executive Summary
Across 5 identical runs, the system demonstrates:
- Highly stable throughput (~55–56 RPS)
- Very consistent latency distribution across runs
- Zero systemic failure rate (100% success in 5/5 runs)
- Predictable external dependency latency (Cognito dominates auth variability)
- Document upload performance is now stable (~45–55ms median)
Core takeaway
The system exhibits a stable, repeatable performance envelope under steady-state load at 50 concurrent users, with no signs of internal saturation at this level.
Aggregate Performance Summary (5 runs)
Throughput
- RPS range: 55.96 – 56.27 req/s
- Variance: extremely low (<1%)
Latency (HTTP overall)
- Median: ~6–12 ms
- Average: ~64–71 ms
- p95: ~350–405 ms
- p99: ~920 ms – 940 ms
- Max spikes: up to ~2–2.8s (external dependency or rare pathing)
Error Rate
- 0% failures in 5/5 runs
Quantitative consolidation (mean RPS, percentile bands, operating point) lives in ../phase_2/ECS_NATIVE_BASELINE_MIX_50VU_LOAD3M_ENVELOPE.md.
Endpoint-Level Behaviour
Actor GET
- Avg: ~7–9 ms
- p99: ~30–70 ms
- Max: occasional spikes up to ~700–800 ms
Interpretation:
- Extremely healthy
- Likely fully in-memory / cached path
Auth Login (Cognito-backed)
- Avg: ~407–437 ms
- Median: ~327–330 ms
- p95: ~1.1–1.4 s
Interpretation:
- Dominant latency source
- Highly consistent across runs
- External dependency bound (not ECS-bound)
Auth Register (Cognito-backed)
- Avg: ~870–883 ms
- Median: ~850–860 ms
- p95: ~990 ms – 1.02 s
Interpretation:
- Stable but expensive operation
- Likely constrained by Cognito user pool operations
Document Upload
- Avg: ~49–53 ms
- Median: ~45–47 ms
- p95: ~74–83 ms
- p99: ~108–137 ms
Interpretation:
- Very stable after tuning
- No longer a bottleneck
- Well-behaved under concurrency
Run-to-Run Stability Analysis
Throughput stability
All runs:
- ~55–56 RPS
- Negligible variance
✔ Indicates stable CPU scheduling and no ECS contention at this load
Latency stability
- Aggregate HTTP p95 across runs falls roughly in ~350–405 ms (mix-sensitive; not endpoint-level p95)
- p99 consistently ~900 ms–1.1 s
✔ Stable tail behaviour across runs
Error behaviour
- No systemic errors observed
✔ No infrastructure instability indicated
Comparison to Local Environment (previous tests)
Throughput
- Local: ~51–54 RPS
- ECS native: ~55–56 RPS
→ Slight improvement, primarily due to:
- more consistent scheduling
- reduced local contention noise
Latency
- Local avg HTTP: ~95–120 ms
- ECS avg HTTP: ~64–71 ms
→ ECS native performs better overall, despite network overhead
Key insight: Native images + ECS steady-state execution offset network overhead from localhost testing.
Tail latency
- Local p99: ~900 ms – 1.1 s
- ECS p99: ~920 ms – 940 ms
→ Essentially unchanged
Interpretation: Tail latency is dominated by external dependency (Cognito), not compute layer.
Key Technical Insights
1. System is not CPU bound at 50 VUs
- Very low internal latencies
- No throughput degradation across runs
2. External dependency dominates tail latency
- Auth flows define p95/p99 behaviour
- ECS does not contribute materially to tail spikes
3. Document pipeline is stable post-tuning
- Circuit breaker tuning successful
- No sustained error propagation
4. System exhibits strong repeatability
- Across 5 runs, variance is minimal
- Indicates a stable production-like baseline
Final Conclusion
At 50 concurrent users:
The system demonstrates a stable, repeatable performance envelope with ~55 RPS throughput, sub-10ms median internal latency, and predictable tail latency primarily driven by AWS Cognito.
What can be confidently stated
- ECS results are actually cleaner than local environment results
- ECS + GraalVM native deployment is stable under steady load
- No evidence of internal bottlenecks at current configuration
- Horizontal scaling tests are now valid to proceed
- Performance characteristics are reproducible across runs
Recommended Next Step
Proceed to scaling analysis:
- 100 VUs
- 200 VUs
- 300+ VUs
Goal:
Identify saturation point and confirm horizontal scaling behaviour
Updated 10 days ago