Archive for the ‘checklist’ tag
Checklist for Migration of Web Application from Traditional Hosting to Cloud
In 2010, Cloud Computing is likely to see increasing adoption. Migrating Web applications from one data center to another is a complex project. To assist you in migrating Web applications from your hosting facilities to cloud hosting solutions like Amazon EC2, Microsoft Azure or RackSpace’s Cloud offerings, I’ve published a set of checklists for migrating Web applications to the Cloud.
These are not meant to be comprehensive step-by-step, ordered project plans with task dependencies. These are checklists in the style of those used in other industries like Aviation and Surgery where complex projects need to be performed. Their goal is get the known tasks covered so that you can spend your energies on any unexpected ones. To learn more about the practice of using checklists in complex projects, I recommend the book Checklist Manifesto by Atul Gawande.
The checklists are published on the RevolutionCloud book Web site at:
www.revolutioncloud.com/2010/01/checklists-migration/
and on the Checklists Wiki Web site at:
Opinion on the Amazon S3 Outage; Checklist for Dealing with Outages
My journalist colleagues at Wired.com published some of my comments related to Amazon S3.1 Wired also posted another article titled Customers Shrug Off S3 Service Failure. I agree with the views of many of the customers expressed in the article. Don MacAskill, CEO of the popular photo hosting site Smugmug, wrote an understanding post about it.
My entire career working for media companies, I’ve held firm the belief that the uptime, reliability, performance, scalability, performance and security of commercial Web sites is of paramount importance. When sites that I’ve been responsible for have had issues, my colleagues and I have given our personal time and energy to resolution. With my teams, I spend considerable time on proactive measures. I’ve had the honor of working closely with and learning from some who do an excellent job running technology operations.
Experience has taught that things can and sometimes do go wrong. Sometimes calculated risks don’t pan out. Sometimes mistakes cause problems. We are human. We should strive for perfection; we can get close to it, but not fully attain it. We should be prepared for such scenarios. When they happen, we should work diligently and expeditiously on resolution and have frequent and honest communications with stakeholders and customers. Such communications during the incident should include:
Update 2010-Jan-24: This checklist is now maintained on the Checklists Wiki Web site at:
www.listswiki.com/wiki/IT_Incident_Reporting
During-Incident Communication Checklist
- Current status
- What is the full impact?
- Estimated time to resolution
- Any recommended workarounds until resolution, if practical
- Assurance that it is being worked on
- It often helps to mention who all are working on it and what they are doing
The post-incident communications to stakeholders and customers should include:
Post-Incident Communication Checklist
- Summary
- What happened, how and why it happened?
- How it was resolved
- If the resolution is temporary or long-term
- Next steps
- Plan for eliminating or minimizing this and similar incidents from happening again
- Thank all those who helped resolve and the customers for their understanding
- Mention the monetary credits you plan to give as per the Service Level Agreement (SLA)
- Specify any additional ‘make goods’ or returns you plan to make to the customers above and beyond the credits as per SLA, if appropriate.
Stakeholders and customers here refer to internal customers of the technology operations team (e.g. the concerned folks in editorial, marketing, sales, finance, legal and other departments). External communications to the public Internet should be handled in consultation with legal and public relations.
S3’s outage (or any outage) isn’t to be taken lightly, but I have faith Amazon and their customers will learn from it.
Disclaimers:
- As explained in the terms of use of this site, any opinions expressed on my personal Web site do not reflect those of any employer, past or present. My Web site and I in my personal life neither represent nor speak for any corporation.
- I have no affiliation, financial or otherwise with Amazon.com. I happen to be a user of their products and services, some of which I like and some that I don’t.
- Personal Web sites like this are exempt from the performance requirements of corporate Web sites :-) My personal Web site is for expressing, learning and R&D. It also happens to be hosted on Amazon EC2 and S3.
- Silicon Alley Insider and ValleyWag have amusing spins on it. :-) [↩]
- There may be extreme instances, especially when criminal activity or malicious wrongdoing was the cause where it would be appropriate to blame someone. [↩]
- It is ok to mention service providers, or describing external events for explaining what happened, but don’t do it in a “it was their fault, not ours” tone. The technology leader should factually describe what happened and take responsibility. [↩]
Business Travel Checklist
Update 2010-Jan-24: This checklist is now maintained on the Checklists Wiki Web site at:
www.listswiki.com/wiki/Business_Travel_Checklist
Document Revision 2.2 2001/07/20