关于Amazon云宕机的网贴收集
最近,互联网上最大的事可能是Amazon的AWS宕机了,而且好几天都没有完全恢复。整个Internet都在讨论这个事,Internet很不高兴,后果可能很严重。可能是因为这个事件对中国没有影响,所以中文这边相关的文章不多,大家可以参考一下和讯网的这篇《伤不起!亚马逊史前最大宕机事件的启示》。
国外有人把所有和这个事件相关的贴子都收集了起来,都是一些相当不错的贴子和文章,尤其是一些经验教训的贴子,很受教,转给大家看看。这个贴子的来源在这里。
目录
个别公司的经历,有好有坏Amazon Web Services 讨论区总结立场:这是用户的错立场:这是Amazon的错教训和启示Vendor很生气
个别公司的经历,有好有坏
How Heroku Survived the Amazon Outage on the Heroku status page
How SimpleGeo Stayed Up During the AWS Downtime by Mike Malone
How SmugMug survived the Amazonpocalypse by Don MacAskill (Hacker News discussion)
How Bizo survived the Great AWS Outage of 2011 relatively unscathed… by Someone at Bizo
Joe Stump’s explanation of how SimpleGeo survived
How Netflix Survived the Outage
Why Twilio Wasn’t Affected by Today’s AWS Issues on Twilio Engineering’s Blog (Hacker News thread)
On reddit’s outage
What caused the Quora problems/outage in April 2011?
Recovering from Amazon cloud outage by Drew Engelson of PBS.
PBS was affected for a while primarily because we do use EBS-backed RDS databases. Despite being spread across multiple availability-zones, we weren’t easily able to launch new resources ANYWHERE in the East region since everyone else was trying to do the same. I ended up pushing the RDS stuff out West for the time being. From Comment
Amazon Web Services 讨论区
有一些有经验的人共享了很多相当不错的宕机的经历。
Amazon Web Services Discussion Forum
Cost-effective backup plan from now on?
Life of our patients is at stake – I am desperately asking you to contact
Why did the EBS, RDS, Cloudformation, Cloudwatch and Beanstalk all fail?
Moved all resources off of AWS
Any success stories?
Is the mass exodus from East going to cause demand problems in the West?
Finally back online after about 71 hours
Amazon EC2 features vs windows azure
Aren’t Availability Zones supposed to be “insulated from failures”?
What a lot of people aren’t realizing about the downtime:
ELB CNAME
Availability Zones were used in a misleading manner
Tip: How to recover your instance
Crying in Forum Gets Results, Silver-level AWS Premium Support Doesn’t
Well-worth reading: “design for failure” cloud deployment strategy
New best practice
Don’t bother with Premium Support
Best practices for multi-region redundancy
“Postmortum“
Learning from this case
Amazon, still no instructions what to do?
Anyone else prepared for an all-nighter?
Is Jeff Bezos going to give a public statement?
Rackspace, GoGrid, StormonDemand and Others
Jeff Barr, Werner Vogels and other AWS persons – where have you been???
After you guys fix EBS do I have do anything on my side?
Need Help!!! Lives of people and billions in revenue are at risk now!!!
I’ve Got A Suspicion
Farewell EC2, Farewell
There were also many many instances of support and help in the log.
总结
Amazon EC2 outage: summary and lessons learned by RightScale
AWS outage timeline & downtimes by recovery strategy by Eric Kidd
The Aftermath of Amazon’s Cloud Outage by Rich Miller
立场:这是用户的错
So Your AWS-based Application is Down? Don’t Blame Amazon by The Storage Architect
The Cloud is not a Silver Bullet by Joe Stump (Hacker News thread)
The AWS Outage: The Cloud’s Shining Moment by George Reese (Hacker News discussion)
Failing to Plan is Planning to Fail by Ted Theodoropoulos
Get a life and build redundancy/resiliency in your apps on the Cloud Computing group
立场:这是Amazon的错
Stop Blaming the Customers – the Fault is on Amazon Web Services by Klint Finley
AWS is down: Why the sky is falling by Justin Santa Barbara (Hacker News thread)
Amazon Web Services are down – Huge Hacker News thread
教训和启示
People Using Amazon Cloud: Get Some Cheap Insurance At Least by Bob Warfield
Basic scalability principles to avert downtime by Ronald Bradford
Amazon crash reveals ‘cloud’ computing actually based on data centers by Kevin Fogarty
Seven lessons to learn from Amazon’s outage By Phil Wainewright
The Cloud and Outages : Five Key Lessons by Patrick Baillie (Cloud Computing Group discussion)
Some thoughts on outages by Till Klampaeckel
Amazon.com’s real problem isn’t the outage, it’s the communication by Keith Smith
How to work around Amazon EC2 outages by James Cohen (Hacker News thread)
Today’s EC2 / EBS Outage: Lessons learned on Agile Sysadmin
Amazon EC2 has gone down -what would a prefered hosting platform be? on Focus
Single Points of Failure by Mat
Coping with Cloud Downtime with Puppet
Amazon Outage Concerns Are Overblown by Tim Crawford
Where There Are Clouds, It Sometimes Rains by Clay Loveless
Availability, redundancy, failover and data backups at LearnBoost by Guillermo Rauch
Cloud hosting vs colocation by Chris Chandler (Hacker News thread)
Amazon’s EC2 & EBS outage by Arnon Rotem-Gal-Oz
Vendor很生气
Amazon Outage Proves Value of Riak’s Vision by Basho
Magical Block Store: When Abstractions Fail Us by Mark Joyent (Hacker News discussion)
On Cascading Failures and Amazon’s Elastic Block Store by Jason
An unofficial EC2 outage postmortem – the sky is not falling from CloudHarmony
转载于酷壳CoolShell 无删改 仅以此纪念陈皓(左耳朵耗子)
我知道这个问题在站内已经讨论过无数次了,比如 关于外键,为什么国内基本都不推荐使用,国外基本都推荐使用?,但是直到现在也没有一个帖子能够达成共识,所以就一些不明确的地方提出些问…
想要一个性价比高一点的。 主要的需求是 存储一些宝宝的视频,手机可以直接操作。 个人建议威联通,群晖感觉特别认硬盘,容易坏 如果你只有最基本的存储+手机客户端+任意地点直…
上个月服务器到期,刚好看到天翼云低价活动,于是就脑抽买了并把服务迁移到天翼云上,于是噩梦开始了... 只要你的域名有备案在天翼云,它的机器人就会不停地扫描你的所有网站,我有几个…