Hosting Large-Scale Web Sites: Contract Review Guide for the CTO

If you host and operate large-scale Web sites, or negotiate contract agreements with vendors that provide such services, you need to understand what should be included in a Web hosting infrastructure. This knowledge will help you in three areas:

  1. Providing reliability, scalability & good performance
  2. Minimizing risks via security, privacy, regulatory compliance and reduction of vulnerability to potential lawsuits
  3. Reducing and controlling costs

This guide is meant to help you review upcoming contracts as well as existing services.

Likely audience for this article: Managers, directors and vice presidents of technology, operations or finance at organizations operating large-scale Web sites; Executives supervising technology: CTO, CIO, CFO, COO.

Seven Aspects of Large-Scale Web Hosting

Large-scale Web hosting infrastructure and services can be organized into the following seven areas:

  1. Servers & Environments
  2. Network & Other Appliances
  3. Managed Hosting Services
  4. Third-party Provided Services
  5. Program Management Office, PMO
  6. Account Management
  7. Infrastructure & Facilities

Checklist for Review

You can use the following checklist to review your hosting services or a vendor’s proposal.

What to look for

When you review each item below, consider:

  • Is this item included in the vendor’s proposal or in the services we are currently receiving? If it is not included, what are the good reasons it isn’t included?
  • Is this needed for my organization’s current business requirements? Can we do without it? Is it a must have or nice to have for present and reasonable future needs?
  • What are the alternatives?
  • What is the unit price of this item? How does the price scale up as needs grow? How does the price scale down when need for this item decreases?
  • What level of fault-tolerance does this item need? i.e. redundancy, standby backups, time to recover

Some of the above review questions may apply only to things and not apply to services and processes.

Servers

Servers may be physical hardware servers and/or virtual servers managed using software such as VMWare, Parallels Virtuozzo or Xen. The services listed below can each run on separate servers or multiple services can run on a server. It is generally better to have servers running only one (or minimum number) of the major services listed below. That reduces complexity and saves expensive staff time saved maintaining, troubleshooting and recovering. Virtualization makes it economical to have multiple virtual servers on the shared physical hardware economize costs.

The following is a list of commonly found services at large-scale Web sites that require servers.

  • Web
  • Application
    • Content Management software. This is the software that the Editorial and Production teams use to submit, edit, package and manage articles, photos and other Web site content
    • Dynamic Content Assembly. Typically done using Portal Server software, either third-party supplied or in-house developed
    • Data Processing. E.g. workflow engines, jobs/tasks processing servers
    • Middleware
    • Other applications. These are applications that happen to be separate from the main content management system. They could be separate for any number of reasons. E.g. blogs, forums
  • Database

Server Environments

An environment is a self-sufficient set of servers assigned to serve a purpose as described below. Large-scale Web sites typically utilize multiple environments.

  • Production
    • This serves the Web sites to the customers and public.
    • Typically has 99.9% or higher uptime guarantee in the Service Level Agreement
      Please refer to the accompanying table titled Understanding SLA Uptime Guarantee Percentages to compare different time windows when the SLA Uptime measurement gets reset. I recommend that you ensure that the reset window you get is the same duration as your billing cycle (usually monthly) or shorter. This will help avoid having long downtimes without penalty.
  • Staging
    • This is the environment where content packages are developed, integrated and previewed by Editorial, Design and Production teams before they are published to the end-users. For example, when working on a major site redesign or relaunch for several months. Since the tech teams are often making changes to the Development Integration and QA environments, they are not suitable for content integration work by the Editorial and Design teams. Staging is used in large-scale Web sites where mutiple Editors, Designers and Production staff are collaboratively creating content packages and new sections. In smaller Web sites or in cases where just one or two Editors are working on a piece of content like an individual article, previewing is done in the Production environment itself with access controls.
  • Quality Assurance (QA)
    • The QA engineers perform Functional Testing and Load Testing here. Doing functional testing while a load test is running is sometimes a good idea as it simulates usage closer to live production.
  • Development Integration
    • Software product code developed by different engineers is integrated here. There could be continuous integration or nightly builds.
    • This is where developers ensure that their code works with other developers’ code (does not break the build, and does not conflict resulting in undesired functionality)
    • Programmers should ensure that the product works here before handing it off to the QA engineers for testing

In a virtualized system the environments may not be physically separate and may regularly grow and shrink at different times. For example when hosted at a cloud computing provider, the QA environment may scale up during load testing and shut down completely during the hours the QA team is not working.

Network & Other Appliances

These are devices to which various servers are directly or indirectly connected.

Managed Hosting Services

  • Systems Administration
    • This typically includes all the management of the physical hardware up to and including the operating system and popular applications that complement the operating system.
  • Database Administration Services
  • Applications Management Services
    • This typically includes all the administration of the applications that run on top of the operating system.
  • Systems Monitoring, Alerting & Reporting
  • Web Support Help Desk, 24×7

Third-party Services

Program Management Office, PMO

  • Project Management
    • PM people, organization, processes
    • Collaborative project management tools, e.g. JIRA, RallyDev, Mingle
    • Shared documentation management tools, e.g. Wiki
  • Change Management Processes & Tools
    • Documentation system
    • Tools for source control, build & deployment
  • RASIC Matrix Describing Roles & Responsibilities
  • Escalation Flowcharts
  • Crisis Management & Emergency Procedures

Account Management

  • Customer service
  • Relationship management
  • Master Services Agreement, MSA
  • Statements of Work, SOW
  • Service Level Agreement, SLA
    • What to look for in the SLA is the subject of a separate article in this series.
  • Billing & Service Level Agreements
    • Monthly bills provided by telecommunications (telco) and hosting companies tend be extremely complex and lengthy. As a result, they are difficult and time-consuming to review.
    • Always factor in one-time setup fees and any implementation fees paid to the vendor and/or their partners in the total cost of the contract. Don’t look only at the recurring charges. A simple way to do this is this:
      contract cost = implementation fees + (estimated recurring fees x number of recurrences committed to)
      e.g. contract cost for 1 year = setup fees + (estimated monthly charges x 12)
      For most hosting / telco contracts I recommend this simple calculation over more sophisticated methods that factor in time value of money because the recurring fees are estimates anyway.
    • Make sure that 1-year contract is really a 1-year contract and not effectively a 13-month, 15-month or even longer contract by ensuring the following:
    • The contract’s start date is the first date for which the recurring billing begins. This is useful in determining the default end date of the contract. For example:
      If you agree to a 1-year contract with monthly billing when the first monthly bill will be for services provided April, 1, 2010 through April 20, 2010, then the default termination date for the contract is March 31, 2011. If the service provider estimates 3 months for implementation that ends on June 30, 2010 and they charge you the monthly services for April, May and June, don’t let the vendor tell you the contract start date is July 1. If you paid the monthly fees for services provided on April 1, then the start date is April 1.
    • If the vendor charges you fractional monthly fees for the implementation period and/or charges you one-time set up fees, then you should negotiate and agree on a contract end date that is fair to both parties. Use this guideline: The contract commitment should aim towards a certain money target (revenue for the vendor). If the implementation fees are equivalent to say, 3 months of recurring billing, you might agree that end date is after 9 months of the first recurring billing cycle.

Tips for Reviewing Technology Vendor Contracts and Service Level Agreements (SLA)

  • Don’t let the vendor use a lower monthly rate for calculating SLA credits.An example: The vendor’s contract section X.YZ1 states that the customer’s service credits will be calculated against a monthly rate of $6,000.00 per month. However, the vendor’s estimated total charges seem to be at least $10,000 per month. Don’t let the vendor calculate service credits based on a lower monthly bill than the actual monthly bill.
  • Don’t get locked into a deal where you could be stuck with overages every month.An example: The vendor’s contract section X.YZ2 locks the customer into the vendor’s service for two years for a total of between $80K/month to $100K/month if the customer remains at under 100 million page views per month. If the customer’s page views go over 100 million in any month, then there will be additional overage charges. There is no out clause nor a pre-determined next rate tier in the customer’s favor in the contract. If customer s traffic rises to regularly being over 100 million page views per month, the customer will be trapped in a contract with recurring overage charges. Make sure that if you have overages in the future, you can move into the next tier, preferably at a better rate.
  • Beware of vaguely defined scheduled maintenance and make sure scheduled maintenance needs customer’s prior approval.An example: The SLA section X.YZ3 states that the vendor can schedule maintenance downtime with 48 hours notice. They can give the customer notice by one of many means. There is no requirement for the customer to review or even acknowledge receipt. This is slanted too much in vendor s favor. The customer should have some ability to reschedule scheduled maintenance or ask for it to be shorter in duration if it interferes with the customer’s business.
  • Make sure that service credits can also be redeemed as cash.An example: The SLA section X.YZ4 states that service credits are not cash. Such credits will only be applied to future service billings. This is usually fine, except if it happens in the last month of the contract or if there is not enough future usage to use up the credits. In such instances, service credits should be payable as cash.
  • If the vendor will charge you for overages, the vendor needs to be responsible for service at the overage usage levels too.An example: The SLA section X.YZ5 states that response time service credits will not apply if monthly page-views exceed 120 million. This is not fair to the customer. The vendor is fine with charging the customer overage fees, but not being responsible for level of service at those levels. If the vendor charges overage fees, it should bind them to providing full service at the exceeded usage as well.

Infrastructure & Facilities

This item, infrastructure & facilities, is beyond the scope of this article. It includes the buildings, electric power, generators, climate control, physical security and related staffing.

This article is part of a series titled “Guide for the CTO: A compilation of articles on how to lead and manage technologies, projects and people”.

Understanding SLA Uptime Percentages

The table below helps illustrate why you should ensure that the “SLA Uptime Measurement Meter Reset Window” is the same duration as your billing cycle or shorter.

Availability % Downtime per year Downtime per month (30 days) Downtime per week Downtime per day
90 “one nine” 36.5 days 72 hours 16.8 hours 2.4000 hours
95 18.25 days 36 hours 8.4 hours 1.2000 hours
97 10.96 days 21.6 hours 5.04 hours 43.2000 minutes
98 7.3 days 14.4 hours 3.36 hours 28.8000 minutes
99 “two nines” 3.65 days 7.2 hours 1.68 hours 14.4000 minutes
99.5 1.83 days 3.6 hours 50.4 minutes 7.2000 minutes
99.8 17.52 hours 86.23 minutes 20.16 minutes 2.8800 minutes
99.9 “three nines” 8.76 hours 43.8 minutes 10.1 minutes 1.4400 minutes
99.95 4.38 hours 21.56 minutes 5.04 minutes 43.2000 seconds
99.99 “four nines” 52.56 minutes 4.32 minutes 1.01 minutes 8.6400 seconds
99.999 “five nines” 5.26 minutes 25.9 seconds 6.05 seconds 0.8640 seconds
99.9999 “six nines” 31.5 seconds 2.59 seconds 0.605 seconds 0.0864 seconds
99.99999 “seven nines” 3.15 seconds 0.259 seconds 0.0605 seconds 0.0086 seconds

Sources: Wikipedia, my calculations

Save Money On Hosting & CDN By Optimizing Your Architecture & Applications

If you manage technology for a company that has a large Web presence, it is likely that a large percentage of your total technology costs is spent on the Web hosting environment, including the Content Delivery Network (CDN, e.g. Akamai, LimeLight, CDNetworks, Cotendo). In this article, we discuss some ways to manage these costs.

Before we discuss how to optimize your architecture and applications to have economical and the optimally low hosting expenses, let us develop a model for comprehensively understanding a site’s Web hosting costs.

Step 1. Develop a model for allocating technology operations & infrastructure costs to each Web site/brand

Let us assume for this example that your company operates some medium to large Web sites and spends $100K/month on fully managed1 origin2 Web hosting and another $50K/month on CDN. That means your company spends $1.8MM/year on Web sites hosting.

It is important to add origin Web hosting and CDN costs to know your true Web hosting costs, especially if you operate multiple Web brands and need to allocate Web hosting costs back to each. For example, let us assume you have two Web sites: brandA.com, a dynamic ecommerce site costing $10K/month on origin hosting plus $2K/month on CDN; and brandB.com, serving a lot of videos and photos costing $5K/month on origin hosting plus $19K/month on CDN. In this example, brandA.com actually costs $12K/month, which is half the hosting cost of brandB.com, $24K/month. Without adding the CDN costs, you may mistakenly assume the opposite that brandA.com costs twice as much to host as brandB.com. Origin hosting and CDN are two sides of the same coin. We recommend that you manage them both together from both technology/architecture and budget perspectives.

Then you add the costs of third-party vendor provided parts of the site rented in the software-as-service model. Next, add licensed software costs used at your hosting location. Let us assume that brandA.com also has:

  • some blogs hosted at wordpress.com for $400/month
  • Google Analytics for $0/month
  • Other licensed platform/application software running on your servers billed separately from the managed hosting. Let us assume brandA.com’s share of that is $1,000/month.

So your Web hosting and infrastructure costs for brandA.com would be $13,400/month. That’s $160,800/year.

Assuming that many of your Web sites share infrastructure and systems management & support staff at your Web hosting provider, you may not have a precise allocation of costs to each brand. That’s ok: It doesn’t need to be perfect nor a staff-time consuming calculation every month. Work with your hosting provider and implement a formula/algorithm that provides a reasonably good breakup and needs to be changed only when there is a major infrastructure change.

Side Note: In order to stay competitive, adapt to changes in the market and meet changing customer sites, brandA.com also needs to do product and software development on a regular basis. However, that’s beyond the scope of this discussion. Managing ongoing product and software development costs for brandA.com could be the subject of another article.

Step 2. Regularly review the tech operations costs for each brand and make changes to control costs

Every month, review your tech operations costs for your business as a whole and for each brand. Make changes in technology and process as needed to manage your expenses. If you don’t review the expenses on a monthly basis, you run the risk of small increases happening in various places every month that add up to a lot.

Without active management done on a monthly basis, brandA.com could creep up from $13,400 to $16,000 the next month and $20,000 the month after. That $1.8MM you were expecting to spend on hosting for the year could turn out to be $2.4MM.

So what does such active management include?

Monitor and manage your bandwidth charges. This is one to keen an eye on. If you bandwidth charges go over your fixed commit, your expenses can quickly blow over budget. If you find bandwidth use increasing, investigate the cause and make course corrections. In some cases, this may simply be due to expected increase in traffic, but in other cases it could be avoided. A related article about taking advantage of browser caching to lower costs provides some tips.

Request your engineers to monitor and manage your servers resource usage (CPU, memory) so that the need for adding hardware can be avoided as much as possible. Enable and ensure regular communications between your technology operations team and your software development team so that software developers are alerted of any application behavior that is consuming more than expected server resources. Give the software developers time to resolve such issues when found.

Review the invoice details to make sure you understand and are in agreement with the invoice. A Web hosting bill can be very detailed and complex to understand. Do not hesitate to ask the hosting provider to explain and justify anything that you don’t understand. Don’t just assume the bills are always correct. They could (and occasionally will) be mistakes in the bills. Be sure to dispute these with the vendor in a respectful and friendly way.

These are just some examples. Please feel welcome to make more suggestions via comments on this post.

The time (and thus money invested) in controlling tech operations cost will be well worth the savings / avoidance of huge cost increases.

Keep abreast of evolving technologies and cost saving methods. Periodically review these with your vendor(s).

Cloud computing is exciting as a technology, and it is equally exciting as a pricing model.

If you find market conditions have changed drastically, request your vendor to consider lowering rates/prices even if you are locked into a contract. You don’t lose anything by asking and the vendor’s response will be an indicator of their customer service and long term business interest with you.

  1. Fully managed Web hosting includes network & hardware infrastructure, 24×7 staff and real estate []
  2. The origin part of your Web hosting environments includes the network and server infrastructure at your hosting facility location(s) where your Web applications and installed and running. It could be in-house data centers or at providers such as RackSpace, IPSoft or Savvis []