Amazon issues apology and explanation for EC2 cloud failure—more food for thought for the EDA cloud
In a long letter, Amazon has issued a complex explanation of the events behind the disastrous, multi-day failure of one zone it its EC2 cloud-based services offering. Essentially, the outage was triggered by human error that was then amplified into a cascade failure by certain design aspects of the EC2 system that were stressed far beyond design parameters when traffic was rerouted during a configuration change. Traffic was directed to the wrong backup system.
Amazon’s long and detailed analysis points out the complexities of cloud service development. These issues are especially critical for EDA vendors and users and cloud-based EDA providers must take such risks into account when developing such services.
More EDA360 Insider blog entries on the Amazon EC2 crash: