If you host and operate large-scale Web sites, or negotiate contract agreements with vendors that provide such services, you need to understand what should be included in a Web hosting infrastructure. This knowledge will help you in three areas:
- Providing reliability, scalability & good performance
- Minimizing risks via security, privacy, regulatory compliance and reduction of vulnerability to potential lawsuits
- Reducing and controlling costs
This guide is meant to help you review upcoming contracts as well as existing services.
Likely audience for this article: Managers, directors and vice presidents of technology, operations or finance at organizations operating large-scale Web sites; Executives supervising technology: CTO, CIO, CFO, COO.
Seven Aspects of Large-Scale Web Hosting
Large-scale Web hosting infrastructure and services can be organized into the following seven areas:
- Servers & Environments
- Network & Other Appliances
- Managed Hosting Services
- Third-party Provided Services
- Program Management Office, PMO
- Account Management
- Infrastructure & Facilities
Checklist for Review
You can use the following checklist to review your hosting services or a vendor’s proposal.
What to look for
When you review each item below, consider:
- Is this item included in the vendor’s proposal or in the services we are currently receiving? If it is not included, what are the good reasons it isn’t included?
- Is this needed for my organization’s current business requirements? Can we do without it? Is it a must have or nice to have for present and reasonable future needs?
- What are the alternatives?
- What is the unit price of this item? How does the price scale up as needs grow? How does the price scale down when need for this item decreases?
- What level of fault-tolerance does this item need? i.e. redundancy, standby backups, time to recover
Some of the above review questions may apply only to things and not apply to services and processes.
Servers
Servers may be physical hardware servers and/or virtual servers managed using software such as VMWare, Parallels Virtuozzo or Xen. The services listed below can each run on separate servers or multiple services can run on a server. It is generally better to have servers running only one (or minimum number) of the major services listed below. That reduces complexity and saves expensive staff time saved maintaining, troubleshooting and recovering. Virtualization makes it economical to have multiple virtual servers on the shared physical hardware economize costs.
The following is a list of commonly found services at large-scale Web sites that require servers.
- Web
- HTTP(S) Content Delivery. E.g. Apache HTTP Server software
- Streaming Content Delivery
- Cache. E.g. to run products like Squid Cache, memcached
- Application
- Content Management software. This is the software that the Editorial and Production teams use to submit, edit, package and manage articles, photos and other Web site content
- Dynamic Content Assembly. Typically done using Portal Server software, either third-party supplied or in-house developed
- Data Processing. E.g. workflow engines, jobs/tasks processing servers
- Middleware
- Other applications. These are applications that happen to be separate from the main content management system. They could be separate for any number of reasons. E.g. blogs, forums
- Database
- Relational Databases. E.g. Oracle, MySQL, PostgreSQL
- Non-Relational Data Stores. E.g. Key-value, NoSQL stores
Server Environments
An environment is a self-sufficient set of servers assigned to serve a purpose as described below. Large-scale Web sites typically utilize multiple environments.
- Production
- This serves the Web sites to the customers and public.
- Typically has 99.9% or higher uptime guarantee in the Service Level Agreement
- Staging
- This is the environment where content packages are developed, integrated and previewed by Editorial, Design and Production teams before they are published to the end-users. For example, when working on a major site redesign or relaunch for several months. Since the tech teams are often making changes to the Development Integration and QA environments, they are not suitable for content integration work by the Editorial and Design teams. Staging is used in large-scale Web sites where mutiple Editors, Designers and Production staff are collaboratively creating content packages and new sections. In smaller Web sites or in cases where just one or two Editors are working on a piece of content like an individual article, previewing is done in the Production environment itself with access controls.
- Quality Assurance (QA)
- The QA engineers perform Functional Testing and Load Testing here. Doing functional testing while a load test is running is sometimes a good idea as it simulates usage closer to live production.
- Development Integration
- Software product code developed by different engineers is integrated here. There could be continuous integration or nightly builds.
- This is where developers ensure that their code works with other developers’ code (does not break the build, and does not conflict resulting in undesired functionality)
- Programmers should ensure that the product works here before handing it off to the QA engineers for testing
In a virtualized system the environments may not be physically separate and may regularly grow and shrink at different times. For example when hosted at a cloud computing provider, the QA environment may scale up during load testing and shut down completely during the hours the QA team is not working.
Network & Other Appliances
These are devices to which various servers are directly or indirectly connected.
- Routers
- Load Balancers
- Firewalls
- Shared Storage (Storage Area Network, SAN; Network Attached Storage, NAS)
- Backup & Restore systems
- Bandwidth (at origin hosting)
- Bandwidth is not a physical thing, but since like electricity, fuel or cell-phone minutes, is something that is metered and paid for monthly, bandwidth usage and charges need to be carefully managed.
Managed Hosting Services
- Systems Administration
- This typically includes all the management of the physical hardware up to and including the operating system and popular applications that complement the operating system.
- Database Administration Services
- Applications Management Services
- This typically includes all the administration of the applications that run on top of the operating system.
- Systems Monitoring, Alerting & Reporting
- Web Support Help Desk, 24×7
Third-party Services
- Content Delivery Network, CDN (e.g. Akamai, Limelight, CDNetworks)
- CDN Network Storage
- CDN Bandwidth Rates for HTTP and Streaming
- External Monitoring, Alerting & Reporting (e.g. Gomez, Keynote)
Program Management Office, PMO
- Project Management
- Change Management Processes & Tools
- Documentation system
- Tools for source control, build & deployment
- RASIC Matrix Describing Roles & Responsibilities
- Escalation Flowcharts
- Crisis Management & Emergency Procedures
Account Management
- Customer service
- Relationship management
- Master Services Agreement, MSA
- Statements of Work, SOW
- Service Level Agreement, SLA
- What to look for in the SLA is the subject of a separate article in this series.
- Billing
- Monthly bills provided by telecommunications (telco) and hosting companies tend be extremely complex and lengthy. As a result, they are difficult and time-consuming to review.
Infrastructure & Facilities
This item, infrastructure & facilities, is beyond the scope of this article. It includes the buildings, electric power, generators, climate control, physical security and related staffing.
This article is part of a series titled “Guide for the CTO: A compilation of articles on how to lead and manage technologies, projects and people”.