Systems Correctness Practices at Amazon Web Services
Jun 13, 2025Systems Correctness Practices at Amazon Web Services. Article on the portifolio of formal methods used across AWS.
Our experience at AWS with TLA+ revealed two significant advantages of applying formal methods in practice. First, we could identify and eliminate subtle bugs early in development—bugs that would have eluded traditional approaches such as testing. Second, we gained the deep understanding and confidence needed to implement aggressive performance optimizations while maintaining systems correctness.
Here’s a list of techniques they use:
- P programming language to model and specify distributed systems. It was used, for example, on migrating Simple Storage Service (S3) from eventual to strong read-after-write consistency.
- Dafne programming language to prove that the Cedar authorization policy language implementation satisfies a variety of security properties
- A tool called Kani was used by the Firecracker team to prove key properties of security boundaries
- Fault Injection Service that injects simulated faults, from API errors to I/O pauses and failed instances
- Also property-based testing, deterministic simulation, and continuous fuzzing or random test-input generation