One of the most important lessons that was reinforced to me during the Aug 28th hacking attacks on Melbourne IT, the domain registrar for the Web sites of The New York Times, Twitter and many others was the importance of human relationships, personal networking and real-time communications during an emergency situation.
In this article, I’ll mention some of the human collaboration and technical aspects of the lessons that were reinforced. Minimizing the chances of having outages and getting hacked is beyond the scope of this article. This post is about resilience (i.e. dealing with and recovering) not prevention.
Note: To learn about the incident itself, see the section ‘Hacking Incident: Information & Misinformation‘ below.
The Power of Human Collaboration
I witnessed the power of multi-participant video conferencing that is commonplace these days thanks to Google Hangouts. Back in 2009, I wrote an article about the benefits of using real-time textual group chat during incident management and emergency situations. All the lessons mentioned in that article were yet again reinforced to me during this incident. I suggest reading that article to review the real-time communications recommendations in it.
Matthew Prince, CEO of CloudFlare summed it up well at the end of his blog post:
We spend our time building technical networks, but it’s comforting to know that the human network is effective as well.
My colleague Jon Oden Tweeted:
Well that was a fun day… lots of fantastic people in the tech community is a silver lining
I felt so honored, humbled, and happy when my friends and their friends at some of the best known Internet companies went above and beyond to help and spent most of their day in a Google Hangout video conference, helping restore public access to The New York Times’ Web site and other sites for the public on the Internet. They did this because they are good, helpful and tech experts. They took time out of their own jobs to help because they care about a common good cause and fighting against malevolence.
It was inspiring to watch so many brilliant tech people and leaders from multiple companies, many of who met on the video call for the first time, collaborated so well together and overcame the problems together. The combination of multi-participant video conferencing and text chat in Google Hangouts made it feel like we were all working in the same physical room together.
I’d like to thank John Roberts , Matthew Prince and the engineers at CloudFlare; David Ulevitch and several engineers at OpenDNS; Mandar Gokhale, John W. McCarthy and others at GoDaddy; The technical infrastructure engineering team at Google; Bob Lord at Twitter; Sara Boddy and team at Demand Media; and many others who helped yesterday. You all showed amazing teamwork on the video conference yesterday.
If you run a high profile Web site, it is critically important that your Disaster Recovery & Business Continuity Plan includes dealing with an emergency when your primary systems are unavailable, despite all the safeguards and backups you have in place. The domain registration hijack was one such example. A company can only have one registrar that holds their domain name registration information. There is no concept of a backup or failover registrar for a domain. To deal with this single point of failure, you need a backup Web presence on a separate domain.
Backup Web site on Separate Domain
You should maintain a backup Web site that:
- Has a different domain name. For example, if your site is at example.com, your backup domain could be example.net.
- Is registered with a different domain name registrar than your primary one. For example, if your primary registrar is MarkMonitor for example.com, then use Network Solutions for the backup domain example.net.
- Uses DNS service hosted somewhere else. For example, if you run and host your own DNS servers for example.com, use an outsourced DNS hosting service like CloudFlare for example.net.
- Uses a different Content Delivery Network (CDN). For example, if you use Akamai for example.com, use CloudFlare for example.net. You must have a CDN on your backup Web site so that it can handle your traffic.
- Is hosted somewhere other than where your primary site is hosted and is implemented using a different (much simpler) technology platform that is highly likely to not have the same vulnerabilities.
- Can feed your Mobile apps. Your mobile applications should be designed to be aware of this backup Web site and should be able to switch to it (automatically or via manual intervention) for retrieving content.
- Does not share administrative access, logins and passwords with the primary site.
- Preferably, this backup domain example.net should be managed by a separate team. This has two benefits: 1. In the situation of the primary team itself being compromised (sysadmin accounts hacked or a rogue employee) 2. The separate team can work on activating the backup site while the primary team focuses on restoring service of the primary site.
What you use your backup Web site for is up to you. If your primary Web site is a news and media Web site, you could use the backup Web site to publish content during an emergency impacting the primary Web site. If it is impractical for the backup Web site to provide similar (or a subset of) functionality, you could use it for providing status updates and communicating what is going on.
When the backup domain is not needed (which will hopefully be the case 99.9% or more of the time), it could simply be used for providing systems status, explaining it is in place for emergencies and linking to the primary Web site.
Access to a Reliable Public DNS
For end users (i.e. people on the public Internet visiting Web sites), I highly recommend considering using OpenDNS (instructions here) and/or Google Public DNS (instructions here) either as primary or as backup DNS providers.
There are also other lessons, not specific to this incident, both process-related and technical that I’ll write about in a separate article.
Hacking Incident: Information & Misinformation
Wired Magazine’s article titled ‘Syrian Electronic Army’ Takes Down The New York Times correctly explained that:
There’s no evidence that the Times’ internal systems were compromised. Instead, the attackers got control of the NYTimes.com domain name this afternoon through the paper’s domain name registrar, Melbourne IT…
Melbourne IT is the company that manages the domain name registration for The New York Times, Twitter and many other well-known sites. On Aug 28, it was Melbourne IT’s computer systems that were hacked which enabled the perpetrators to hijack The NY Times, Twitter and other companies’ domain names.
Surprisingly, the article on Ars Technica (an otherwise well-respected technology publication) was inaccurate and misleading. It incorrectly stated (quote) “The Times DNS records have been altered, and now point to an Australian hosting company, Melbourne IT.” (end quote) That would lead readers to incorrectly believe that the hackers redirected nytimes.com to DNS or fake Web sites hosted at Melbourne IT.
I’m surprised that the Ars Technica staff did not do their research before writing that post. Melbourne IT is (and has been for years) the official domain name registrar of The New York Times. In addition to being a domain name registrar, Melbourne IT also happens to be a hosting company, but that had nothing to do with the incident. As far as I know, none of the sites impacted that day used Melbourne IT for anything other than domain registration.
What Ars Technica should have said (like their sister publication Wired did) is that the perpetrators hijacked the nytimes.com and other Web sites by hacking Melbourne IT, the company that holds their domain name registration.
The Ars article is also misleading in its claim that The nytimes.com DNS records were altered. It was the domain registration records at Melbourne IT that were altered that then pointed to a whole different set of DNS servers outside of The NY Times’ control.
The Ars Technica article also pointed readers to NY Times’ URL by one of its IP addresses, which was also a mistake. If the Ars Technica folks had tested it themselves, they’d have realized that pointing people to the URL by IP was not the recommended way to access nytimes.com during the incident. Clicking on links on the IP page leads back to www.nytimes.com by name. They should have instead pointed readers to the alternate news.nytco.com URL recommended by The New York Times’ staff, which is what the Wired article did. They could have also suggested other good solutions like switching to using OpenDNS. In fact, I’d have expected a technical publication of Ars Technica’s good reputation to have published a step-by-step guide on switching to OpenDNS, which they wrote about back in 2006.