Amazon S3-izure cause: Half the web vanished because an AWS bod fat-fingered a command • The Register

https://www.theregister.co.uk/2017/03/02/aws_s3_crash_result_of_fatfingered_command/

5 Comments

  1. Tomi Engdahl says:

    Nat Levy / GeekWire:
    Amazon says AWS outage was caused by human error during routine server maintenance, will make changes to prevent future problems — Amazon has released an explanation of the events that caused the big outage of its Simple Storage Service Tuesday, also known as S3, crippling significant portions of the web for several hours.

    Amazon explains big AWS outage, says employee error took servers offline, promises changes
    http://www.geekwire.com/2017/amazon-explains-massive-aws-outage-says-employee-error-took-servers-offline-promises-changes/

    Reply
  2. Tomi Engdahl says:

    Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region
    https://aws.amazon.com/message/41926/

    The Amazon Simple Storage Service (S3) team was debugging an issue causing the S3 billing system to progress more slowly than expected.

    At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

    Removing a significant portion of the capacity caused each of these systems to require a full restart. While these subsystems were being restarted, S3 was unable to service requests. Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.

    Reply
  3. Tomi Engdahl says:

    Internet Security? Not Even Close
    http://semiengineering.com/internet-security-not-even-close/

    The number of threats continues to expand. It’s time the tech industry began embracing solutions.

    This week’s outage at Amazon Web Services is yet another reminder that Internet security is still not quite there.

    Amazon isn’t a second-tier cloud services provider. It’s one of the biggest cloud companies on the planet. If Amazon can’t get it right, it’s hard to imagine anyone can. The company’s Simple Storage Service, aka S3, was the target, and it took about five hours before this online storage was up and running.

    Compared with other outages, damage was minimal. It doesn’t appear that private data was hacked, which is good because one of Amazon’s S3 customers is the U.S. Securities and Exchange Commission. As a point of reference, Yahoo suffered from three successive attacks that gave hackers access to data from at least 1.5 billion accounts. And Target’s 2015 breach compromised the data of 40 million customers.

    Reply
  4. Tomi Engdahl says:

    Amazon S3: Out Like a Light; On Like a Bathtub
    http://hackaday.com/2017/03/03/amazon-s3-out-like-a-light-on-like-a-bathtub/

    You no doubt heard about the Amazon S3 outage that happened earlier this week. It was reported far and wide by media outlets who normally don’t delve into details of the technology supporting our connected world. It is an interesting thing to think that most people have heard about The Cloud but never AWS and certainly not S3.

    Turns out the S3 team was working on a problem with some part of the billing system and to do so, needed to take a few servers down. An incorrect command used when taking those machines down ended up affecting a larger block than expected. So they went out like a light switch — but turning that switch back on wasn’t nearly as easy.

    The servers that went down run various commands in the S3 API. With the explosive growth of the Simple Storage Service, this “reboot” hadn’t been tried in several years and took far longer than expected.

    Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region
    https://aws.amazon.com/message/41926/

    Reply
  5. Tomi Engdahl says:

    The AWS Outage and the Need for “Offline” IoT
    https://medium.com/@TheNeura/the-aws-outage-and-the-need-for-offline-iot-4358521f7344#.3mf3c34xz

    Thoughts on mobile offline machine learning in the wake of a “cloud” debacle

    Reply

Leave a Reply to Tomi Engdahl Cancel reply

Your email address will not be published. Required fields are marked *

*

*