Checklist for Migration of Web Application from Traditional Hosting to Cloud

In 2010, Cloud Computing is likely to see increasing adoption. Migrating Web applications from one data center to another is a complex project. To assist you in migrating Web applications from your hosting facilities to cloud hosting solutions like Amazon EC2, Microsoft Azure or RackSpace’s Cloud offerings, I’ve published a set of checklists for migrating Web applications to the Cloud.

These are not meant to be comprehensive step-by-step, ordered project plans with task dependencies. These are checklists in the style of those used in other industries like Aviation and Surgery where complex projects need to be performed. Their goal is get the known tasks covered so that you can spend your energies on any unexpected ones. To learn more about the practice of using checklists in complex projects, I recommend the book Checklist Manifesto by Atul Gawande.

Your project manager should adapt them for your project. If you are not familiar with some of the technical terms below, don’t worry: Your engineers will understand them.

Pre-Cutover Migration Checklist

The pre-cutover checklist should not contain any tasks that “set the ship on sail”, i.e. you should be able to complete the pre-cutover tasks, pausing and adjusting where needed without worry that there is no turning back.

  • Set up communications and collaboration
    • Introduce migration team members to each other by name and role
    • Set up email lists and/or blog for communications
    • Ensure that appropriate business stakeholders, customers and technical partners and vendors are in the communications. (E.g. CDN, third-party ASP)
  • Communicate via email and/or blog
    • Migration plan and schedule
    • Any special instructions, FYI, especially any disruptions like publishing freezes
    • Who to contact if they find issues
    • Why this migration is being done
  • Design maintenance message pages, if required
  • Setup transition DNS entries
  • Set up any redirects, if needed
  • Make CDN configuration changes, if needed
  • Check that monitoring is in place and update if needed
    • Internal systems monitoring
    • External (e.g. Keynote, Gomez)
  • Create data/content migration plan/checklist
    • Databases
    • Content in file systems
    • Multimedia (photos, videos)
    • Data that may not transfer over and needs to be rebuilt at new environment (e.g. Search-engine indexes, database indexes, database statistics)
  • Export and import initial content into new environment
  • Install base software and platforms at new environment
  • Install your Web applications at new environment
  • Compare configurations at old environments with configurations at new environments
  • Do QA testing of Web applications at new environment using transition DNS names
  • Review rollback plan to check that it will actually work if needed.
    • Test parts of it, where practical
  • Lower production DNS TTL for switchover

During-Cutover Migration Checklist

  • Communicate that migration cutover is starting
  • Data/content migration
    • Import/refresh delta content
    • Rebuild any data required at new environment (e.g. Search-engine indexes, database indexes, database statistics)
  • Activate Web applications at new environment
  • Do QA testing of Web applications at new environment
  • Communicate
    • Communicate any publishing freezes and other disruptions
    • Activate maintenance message pages if applicable
  • Switch DNS to point Web application to new hosting environment
  • Communicate
    • Disable maintenance message pages if applicable
    • When publishing freezes and any disruptions are over
    • Communicate that the Web application is ready for QA testing in production.
  • Flush CDN content cache, if needed
  • Do QA testing of the Web application in production
    • From the private network
    • From the public Internet
  • Communicate
    • The QA testing at the new hosting location’s production environment has passed
    • Any changes for accessing tools at the new hosting location
  • Confirm that DNS changes have propagated to the Internet

Post-Cutover Migration Checklist

  • Cleanup
    • Remove any temporary redirects that are no longer needed
    • Remove temporary DNS entries that are no longer needed
    • Revert any CDN configuration changes that are no longer needed
    • Flush CDN content cache, if needed
  • Check that incoming traffic to old hosting environment has faded away down to zero
  • Check that traffic numbers at new hosting location don’t show any significant change from old hosting location
    • Soon after launch
    • A few days after launch
  • Check monitoring
    • Internal systems monitoring
    • External (e.g. Keynote, Gomez)
  • Increase DNS TTL settings back to normal
  • Archive all required data from old environment into economical long-term storage (e.g. tape)
  • Decommission old hosting environment
  • Communicate
    • Project completion status
    • Any remaining items and next steps
    • Any changes to support at new hosting environment

The checklists are also published on the RevolutionCloud book Web site at www.revolutioncloud.com/2010/01/checklists-migration/ and on the Checklists Wiki Web site at www.checklistnow.org/wiki/IT_Web_Application_Migration

Save Money On Hosting & CDN By Optimizing Your Architecture & Applications

If you manage technology for a company that has a large Web presence, it is likely that a large percentage of your total technology costs is spent on the Web hosting environment, including the Content Delivery Network (CDN, e.g. Akamai, LimeLight, CDNetworks, Cotendo). In this article, we discuss some ways to manage these costs.

Before we discuss how to optimize your architecture and applications to have economical and the optimally low hosting expenses, let us develop a model for comprehensively understanding a site’s Web hosting costs.

Step 1. Develop a model for allocating technology operations & infrastructure costs to each Web site/brand

Let us assume for this example that your company operates some medium to large Web sites and spends $100K/month on fully managed1 origin2 Web hosting and another $50K/month on CDN. That means your company spends $1.8MM/year on Web sites hosting.

It is important to add origin Web hosting and CDN costs to know your true Web hosting costs, especially if you operate multiple Web brands and need to allocate Web hosting costs back to each. For example, let us assume you have two Web sites: brandA.com, a dynamic ecommerce site costing $10K/month on origin hosting plus $2K/month on CDN; and brandB.com, serving a lot of videos and photos costing $5K/month on origin hosting plus $19K/month on CDN. In this example, brandA.com actually costs $12K/month, which is half the hosting cost of brandB.com, $24K/month. Without adding the CDN costs, you may mistakenly assume the opposite that brandA.com costs twice as much to host as brandB.com. Origin hosting and CDN are two sides of the same coin. We recommend that you manage them both together from both technology/architecture and budget perspectives.

Then you add the costs of third-party vendor provided parts of the site rented in the software-as-service model. Next, add licensed software costs used at your hosting location. Let us assume that brandA.com also has:

  • some blogs hosted at wordpress.com for $400/month
  • Google Analytics for $0/month
  • Other licensed platform/application software running on your servers billed separately from the managed hosting. Let us assume brandA.com’s share of that is $1,000/month.

So your Web hosting and infrastructure costs for brandA.com would be $13,400/month. That’s $160,800/year.

Assuming that many of your Web sites share infrastructure and systems management & support staff at your Web hosting provider, you may not have a precise allocation of costs to each brand. That’s ok: It doesn’t need to be perfect nor a staff-time consuming calculation every month. Work with your hosting provider and implement a formula/algorithm that provides a reasonably good breakup and needs to be changed only when there is a major infrastructure change.

Side Note: In order to stay competitive, adapt to changes in the market and meet changing customer sites, brandA.com also needs to do product and software development on a regular basis. However, that’s beyond the scope of this discussion. Managing ongoing product and software development costs for brandA.com could be the subject of another article.

Step 2. Regularly review the tech operations costs for each brand and make changes to control costs

Every month, review your tech operations costs for your business as a whole and for each brand. Make changes in technology and process as needed to manage your expenses. If you don’t review the expenses on a monthly basis, you run the risk of small increases happening in various places every month that add up to a lot.

Without active management done on a monthly basis, brandA.com could creep up from $13,400 to $16,000 the next month and $20,000 the month after. That $1.8MM you were expecting to spend on hosting for the year could turn out to be $2.4MM.

So what does such active management include?

Monitor and manage your bandwidth charges. This is one to keen an eye on. If you bandwidth charges go over your fixed commit, your expenses can quickly blow over budget. If you find bandwidth use increasing, investigate the cause and make course corrections. In some cases, this may simply be due to expected increase in traffic, but in other cases it could be avoided. A related article about taking advantage of browser caching to lower costs provides some tips.

Request your engineers to monitor and manage your servers resource usage (CPU, memory) so that the need for adding hardware can be avoided as much as possible. Enable and ensure regular communications between your technology operations team and your software development team so that software developers are alerted of any application behavior that is consuming more than expected server resources. Give the software developers time to resolve such issues when found.

Review the invoice details to make sure you understand and are in agreement with the invoice. A Web hosting bill can be very detailed and complex to understand. Do not hesitate to ask the hosting provider to explain and justify anything that you don’t understand. Don’t just assume the bills are always correct. They could (and occasionally will) be mistakes in the bills. Be sure to dispute these with the vendor in a respectful and friendly way.

These are just some examples. Please feel welcome to make more suggestions via comments on this post.

The time (and thus money invested) in controlling tech operations cost will be well worth the savings / avoidance of huge cost increases.

Keep abreast of evolving technologies and cost saving methods. Periodically review these with your vendor(s).

Cloud computing is exciting as a technology, and it is equally exciting as a pricing model.

If you find market conditions have changed drastically, request your vendor to consider lowering rates/prices even if you are locked into a contract. You don’t lose anything by asking and the vendor’s response will be an indicator of their customer service and long term business interest with you.

  1. Fully managed Web hosting includes network & hardware infrastructure, 24×7 staff and real estate []
  2. The origin part of your Web hosting environments includes the network and server infrastructure at your hosting facility location(s) where your Web applications and installed and running. It could be in-house data centers or at providers such as RackSpace, IPSoft or Savvis []

Save Your Company Money In Monthly Bills Using Browser Caching

Bandwidth MoneyCompanies that operate heavily trafficked Web sites can save thousands of dollars every month by maximizing their use of browser-side caching.

Large Web sites pay for bandwidth at their Web hosting data center and also at their content delivery network (CDN, e.g. Akamai, LimeLight, CDNetworks). Bandwidth costs add up to huge monthly bills. On small-business or personal Web sites where bandwidth costs don’t go over, this is not an issue, but on large Web sites, this is important to address and monitor.

Companies operating large Web sites often have complex situations like the following:

  • An comprehensive and deep understanding of all technology cost drivers and their impacts on each other. For example, a programmer may think they are saving the company money by architecting an application in a way that it requires minimal hardware servers, but not realize that the same design actually results in even higher costs elsewhere like CDN bills.
  • Busy development teams working on multiple projects on tight timelines. This results in compromises between product features/timelines and technical/architectural best practices/standards.
  • Web content management and presentation platform(s) that have evolved over the years
  • Staff churn over the years and an uneven distribution of technical knowledge and best practices about the Web site(s)
  • The continued following of some obsolete “best practices” and standards that were established long ago when they were beneficial, but are now detrimental.

Tech teams at complex Web sites would likely find upon investigation that their Web sites suffer from problems that they either didn’t know about or didn’t know the extent of the damage they are causing.

One such problem is that certain static objects on the company’s Web pages that should be cached by the end users’ Web browsers are either not cached by the browsers at all or not cached enough. Some objects are at least cached by the CDN used by the company, but some perfectly cacheable objects are served all the way back form the origin servers for every request! An unnecessarily costly situation that can be avoided.

In addition to wasteful bandwidth charges resulting in high monthly bills, there are also other disadvantages caused by cacheable objects being unnecessarily served from origin servers:

  • They slow down your Web pages. Instead of the browser being able to use local copies of these objects, it has to fetch them all the way from your origin servers.
  • Unnecessary load on origin Web servers and network equipment at Web hosting facility. This can be an especially severe problem when a Web site experiences a sudden many-fold increase in traffic caused by a prominent incoming link on the home page of a high traffic like Yahoo, MSN or Google.
  • Additional storage in logs at the origin Web hosting locations’ servers and other devices.
  • Unneeded processing and work the origin servers, network equipment, CDN, the Internet in the middle all the way up to the client browsers have to do to transfer these objects from origin to the end user’s browser. Be environmentally friendly and avoid all this is costly waste.

The increase in bandwidth, load on servers and networking equipment and log file storage space increases caused by a few objects on Web pages being served by origin servers for every request may mistakenly seem like an insignificant problem, but little drops of water make the mighty oceans. Some calculations will show that for large Web sites, the cost of this can add up to tens of thousands of dollars a month in bandwidth costs alone.

How should companies operating large Web sites solve this problem?

For technology managers:

  • Make it a best practice to maximize the use of browser-side caching on your Web pages. Discuss this topic with the entire Web technology team. Awareness among the information workers is important so that they can keep this in mind for future work and also address what’s already in place. Show the engineers some sample calculations to illustrate how much money is wasted in avoidable bandwidth costs: that will prove this is not an insignificant issue.
  • If this problem is widespread in your Web site(s), make the initial cleanup a formal project. Analyze how much money you’d save and other problems you’d solve by fixing this and present it to the finance and business management. Once you show the cost savings, especially in this economy, this project will not be hard to justify.

For engineers:

  • Read the article about optimizing caching at Google Code for technical details on how to leverage browser and proxy caching. It explains the use of HTTP headers like Cache-Control, Expires, Last-Modified, and Etag.
  • Review any objects that are served by origin servers every time for legacy reasons that may now be obsolete.
  • Combine some JavaScript files commonly used by your Web pages so that the one unified and shared file would have higher caching probability. Do the same with external CSS style sheets.
  • Study a good book on Web site optimization like Even Faster Web Sites: Performance Best Practices for Web Developers. Share these recommendations and hold a discussion with your tech and production colleagues.

Using Amazon Elastic Block Store (EBS) with an EC2 Instance

Amazon AWS Logo

One of the differences between Amazon EC2 server instances and normal servers is that the server’s local disk storage state (i.e. changes to data) on EC2 instances does not persist over instance shutdowns and powering on. This was mentioned in my earlier post about hosting my Web site on Amazon EC2 and S3,

Therefore, it is a good idea to store your home directory, Web document root and databases on an Amazon EBS volume, where the data does persist like in a normal networked hard drive. Another benefit of using an Amazon EBS volume as a data disk is that it separates your operating system image from your data. This way, when you upgrade from a server instance with less computing power to one with more computing power, you can reattach your data drive to it for use there.

You can create an EBS volume and attach it to your EC2 server instance using a procedure similar to the following.

First, create an EBS volume.

You can use Elasticfox Firefox Extension for Amazon EC2 to:

  • create a EBS volume
  • attach it to your EC2 instance
  • alias it to a device, In this example, we use /dev/sdh

Then attach the “disk” to your EC2 instance and move your folders to it using a procedure similar to the following commands issued from a bash shell.

# Initialize (format) the EBS drive to prepare it for use
# Note: replace /dev/sdh below with the device you used for this EBS drive
mkfs.ext3 /dev/sdh
#
# Create the mount point where the EBS drive will be mounted
sudo mkdir /mnt/rj-09031301
# Side note: I use a naming convention of rj-YYMMDDNN to assign unique names
# to my disk drives, where YYMMDD is the date the drive was put into service
# and NN is the serial number of the disk created that day.
#
# Mount the EBS drive
sudo mount -t ext3 /dev/sdh /mnt/rj-09031301
#
# Temporarily stop the Apache Web server
sudo /etc/init.d/apache2 stop

#
# Move the current /home folder to a temporary backup
# This temporary backup folder can be deleted later
sudo mv /home /home.backup

#
# Symbolic link the home folder on the EBS disk as the /home folder
sudo ln -s /mnt/rj-09031301/home /home
#
# Start the Apache Web server
sudo /etc/init.d/apache2 start

Limitations:

One current limitation of EBS volumes is that a particular EBS disk can only be attached to one server instance at a given time. Hopefully, in a near future version upgrade of EC2 and EBS, Amazon will enable an EBS volume to be attached to multiple concurrent server instances. That will enable EBS to be used similar to how SAN or NAS storage is used in a traditional (pre cloud computing era) server environment. That will enable scaling Web (and other) applications without having to copy and synchronize data across multiple EBS instances. Until Amazon adds that feature, you will need to maintain one EBS disk per server and keep their data in sync. One method of making the initial clones is to use the feature that creates a snapshot of an EBS volume onto S3.

Related article on Amazon’s site:

I now use a device called Drobo for storing data at my home network (Product Review)

droboI now use a device called Drobo (2nd generation), manufactured by Data Robotics, Inc. as the primary data storage and backup medium at my home location. I have attached it via a USB 2.0 cable to my Apple Airport Extreme wireless network router. The Airport Extreme enables me to share USB 2.0 based storage devices on my home network so they can be simultaneously used by multiple computers. This system of making the same hard disk(s) available to multiple computers in a network is called Network Attached Storage (NAS).

The Drobo replaced my USB 2.0 external hard disk drive manufactured by Western Digital (WD) that was earlier attached to my Airport Extreme. The Drobo has significant advantages over the previous WD drive:

  • Data protection in the event of one hard drive failure as a result of wear and tear due to use over time
  • Ability to increase the storage size of the device as the volume of my data grows (more photographs, music, videos, etc.)

A data protection strategy should include both local fault tolerance and remote storage in an offsite location. For off site storage, I keep copies of my data at online locations like Amazon S3, Smugmug, Google Docs, IMAP mail servers and Apple’s Mobile Me service. Since the volume of my data is in terabytes (~ 15 years of emails, photographs, music, videos), recovering large amounts of data from online locations is reserved for extreme situations when local storage is destroyed or corrupted. The Drobo uses technology to protect data in cases of failure (via normal wear and tear) of one of the hard disks inside the Drobo. The benefit I get is similar to the benefit provided by a set of technologies called RAID.

Unlike most RAID devices for home/small-business use, the Drobo allows me to mix and match hard drives of varying capacities. It has 4 bays to insert hard drives. For example, I can have two 1 TB drives today. Next month, I can add another 1.5 TB drive to the 3rd slot. A few months later, I can add a 2 TB drive to the 4th slot. Then when I need more space next year, I can replace one of the 1 TB drives with a 2 TB drive. As I make these changes, the Drobo will automatically recalculate the optimal distribution of my data across all these drives to maximize its storage space and provide data protection. Adding a new drive or replacing a drive with another is done without downtime. The Drobo stays up and running during disk changes and the data on it remains usable by my computers, even while I’m replacing a drive or when it is redistributing data on the new set of drives after a drive is inserted.

Benefits

[amazon-product align=”right”]B001CZ9ZEE[/amazon-product]

  • Save money by buying 1 TB drives today and 2 TB drives when they are cheaper in the future
  • Save money by buying just hard disks for adding storage instead of buying a drive plus an enclosure and power supply adapter for each drive. This is also energy efficient (fewer power adapters) and good for the Environment (fewer drive enclosures made of plastic and power adapter units purchased)
  • Save time by letting the Drobo take care of data protection at the local level. Also save time that would have been spent recovering data from remote locations in the event of a local drive failure.
  • Peace of mind having good data protection at home.

Moving data from the WD external drive to Drobo

I transferred the files by having both drives directly connected to my Apple Macbook Pro: The Drobo to the Firewire 800 port and the WD to the USB 2.0 port.

My MacBook Air‘s Time Machine backups used to be stored on the USB external drive. Since it was attached to an Airport Extreme, the Apple Time Machine backups were stored in a special virtual storage location called a sparsebundle. I just copied the sparsebundle from the WD drive to the Drobo and now my Time Machine backups (including all the Time Machine history of my MacBook Air) are now transferred to my Drobo storage device. Thanks to my Time Machine bundles having been on a sparsebundle, it was easy to transfer them to my new Drobo storage device using a simple copy process using the Mac OS Finder.

[rating:5/5]

Future of Content Management for News Media for Web sites

Content on Web sites should be managed using systems that were designed from the ground up for the Web. Traditional content management systems with a legacy of features and workflows used for paper-based print products like newspapers and magazines are unsuitable for Web sites. The future of news media content management for Web sites is in:

  • simple & quick workflows
  • blogs & wikis as the main content types for text
  • social networking & community publishing

simple & quick workflows

Complex editorial workflows make sense for print products (on paper) , where once the edition is done, the content and presentation state is “locked” and sent to the presses. Working with Web content writers and editors over the past decade, I have learned that simple, quick workflows are preferable for Web sites. Many Web site producers who hail from print backgrounds now share the same conclusion that complex content management is a hindrance to successful Web site production.

The concept of an edition of the entire product is not necessary for a content Web site. The atomic unit that can be managed and published together can be a package of articles and multimedia or even just one article. A Web site is a living, dynamic, ever changing collection of content where individual items can be updated whenever required or desired or even automatically based on usage.

To be competitive, content needs to be updated and published quickly. Corrections can be made anytime. Thus for Web sites, the editing and approval process should be streamlined and quick all the way from authoring to posting on the site.

A new concept

The ability of online word processors like Google Docs or WriteWith to enable multiple people to edit a document simultaneously and collaboratively is different paradigm from traditional check-in/check-out access control.

blogs & wikis as the main content types for text

Content management system (CMS) which offers the simplicity of blogs and are extensible via plug-ins to add functionality like WordPress or MovableType, make good foundations of a CMS for a news media Web site.

For revisions, editing history and access control, wiki software works well. WikiPedia and WikiNews, which are powered by the MediaWiki software are two good examples.

The concept of content management systems that combine the agility of blogs and editorial control of wikis is interesting to follow. The term bliki seems to be the leading classification of such products.

In many newsrooms, writers are increasingly using blog posts to publish news articles instead of their enterprise-class content management systems. When asked why, they reply because it is simpler and quicker and they don’t need the overhead of things like complex approvals, advanced version tracking and access controls.

social networking & community publishing

Managing content using a blog or wiki is social networking and community publishing activity. On the readership side, successful social news sites like Digg and Reddit have accelerated the evolution of journalism and readership habits towards the social/community model. The distinction between authors and readers itself is blurring with wikis and comments on blogs.

Social networking features are being added to a variety of Web sites. Going forward, expect to see social networking and community features in content management systems.

Conclusion

Media companies should move to using CMS products that prefer simplicity over complex editorial workflows which were a legacy of writing and editing for print products. A news item, story or blog post should be the same content type. It is likely that blogging products that have proven so successful in empowering talented individuals in competing with large companies will evolve into content management systems with the addition of wiki functionality.

Social Graphs API: WordPress Plugin: Blogroll Links

If you already know what the Social Graph API and XFN are, you can skip the background information and go directly to the Blogroll Links plugin for WordPress that is designed to work with these.

Update: 2010-Feb-20: Version 2 of the Blogroll Links plugin for WordPress uses the Shortcode API and so introduces a new code-tag format. The new plugin still supports the old (now deprecated) code-tag format for backwards compatibility. See below for examples.

Social Graph API

Google recently announced the Social Graph API.1 From Google’s Code site:

With so many websites to join, users must decide where to invest significant time in adding their same connections over and over. For developers, this means it is difficult to build successful web applications that hinge upon a critical mass of users for content and interaction. With the Social Graph API, developers can now utilize public connections their users have already created in other web services. It makes information about public connections between people easily available and useful.

We (Google) currently index the public Web for XHTML Friends Network (XFN), Friend of a Friend (FOAF) markup and other publicly declared connections. By supporting open Web standards for describing connections between people, web sites can add to the social infrastructure of the web.

The Google Code site also has a video introduction to the open social graph:

The Google Code site has some interesting example applications. To see the power of the open social graph, follow these links:

All I did was enter my home page http://www.rajiv.com/ into these applications and got the results linked to above.

XHTML Friends Network, a component of open social networks

XFN (XHTML Friends Network) is a simple way to represent human relationships using hyperlinks. In recent years, blogs and blogrolls have become the fastest growing area of the Web. XFN enables web authors to indicate their relationship(s) to the people in their blogrolls simply by adding a ‘rel‘ attribute to their <a href> tags, e.g.:

<a href="http://www.rajiv.com/" rel="friend met">Home Page: Rajiv Pant</a>

The above link means that the page at http://www.rajiv.com/ belongs to a friend of the person who who owns the page this link is placed on. The met tag specifies that the two friends have met in real life. The link above would not be placed on a page owned by Rajiv Pant. It would be placed by a friend on their page, for example, on http://www.paradox1x.org/

Here is another example:

<a href="http://photos.rajiv.com/" rel="me">Photo Albums: Rajiv Pant</a>

This link states that the page at the URL http://photos.rajiv.com/ belongs to the same person who owns the page this link is placed on. For example, the above link would be placed on http://www.rajiv.com/ telling the Web that the URLs http://photos.rajiv.com/ and http://www.rajiv.com/belong to the same person.

To find out how to write and use XFN, or to write a program to generate or spider it, visit the XFN Web site.

Blogroll Links Plugin for WordPress

For people who maintain their Web site or blog using the WordPress blog content management system, I created an open source plugin called blogroll-links that uses WordPress’ built-in Blogroll feature2 and presents links to friends’ home pages and own pages on social networking sites using XFN in the links.

Features of this plugin

  • It can show the links by category in blog posts and WordPress Pages.
  • It uses WordPress’ standard built-in Blogroll links database. There is no hassle of another list of links to maintain.
  • It can be used to show only the links assigned to a particular category, by stating the category slug as defined in that category’s setting in WordPress.
  • It honors the Show/Hidden setting as defined for each link in WordPress.
  • It displays the link in the same window or new window, as specified for each link in WordPress.

See this plugin in action

  • http://www.rajiv.com/friends/
    • The two lists, first one of links to my own pages on various social networking sites and the second one of links to some of my friends’ pages are generated by this plugin. Yes, those social networks’ logo pictures are also taken by the plugin from the WordPress standard Blogroll links. Code:
    • <h3>My Pages on Social Networking Sites</h3>
      [blogroll-links categoryslug="rajiv-web" sortby="link_name" sortorder="desc"]
      <h3>Web Sites of Some People I Know</h3>
      [blogroll-links categoryslug="people" sortby="link_name" sortorder="desc"]
  • http://www.rajiv.com/charity/
    • This list of charitable organizations with brief descriptions is generated by the plugin. Code:
    • [blogroll-links categoryslug="charity"]
  • http://www.rajiv.com/blog/2004/08/02/search-engines/
    • This list of search engines is maintained as Blogroll links in WordPress. Code:
    • [blogroll-links categoryslug="search-engines"]
  • http://www.rajiv.com/
    • The featured links shown under the “What’s featured here?” section shows the links I’ve categorized as featured in WordPress’ Blogroll links. Code:
    • <a title="featured" name="featured"></a>
      <h2>What's featured here?</h2>
      [blogroll-links categoryslug="featured" sortby="link_name" sortorder="desc"]

Download & install plugin

  1. WikiPedia article explaining what an API, or application programming interface is. []
  2. It does not make you maintain yet another list of links []

This Web Site is Now Hosted on Amazon EC2 & S3

This web site, www.rajiv.com is now hosted on Amazon.com’s Elastic Compute Cloud (EC2) and Simple Storage Service (S3) services. They are part of Amazon Web Services offerings. If you are a technologist, I recommend EC2 and S3. To learn more about them, you can follow the links in this article.

Benefits of hosting a Web site on EC2 & S3

  • The hosting management is self-service. Anytime you want, you can provision additional servers yourself and immediately. Unlike with most traditional hosting companies, there is not need to contact their staff and have to wait for them to set up your server. On EC2, once you have signed up for an account and set up one server, you can provision (or decommission) additional servers within minutes. Even the initial setup is self-service.
  • EC2 enables you to increase or decrease capacity within minutes. You can commission one or hundreds of server instances simultaneously. Because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs. Billing is metered by an hour as the unit. This flexibility of EC2 can benefits many use cases:
    • If your web sites get seasonal traffic (e.g. a fashion site during shows) or can temporarily get much higher traffic for a period of time (e.g. a news site), EC2’s business model of pay for what you use by the hour, is cost-effective and convenient.
    • If yours is the R&D or Skunkworks group at a large or medium size organization or a startup company with limited financial resources, renting servers from EC2 can have many benefits. You don’t have to make a capital investment to get a server farm up and running, nor make long-term financial commitments to rent infrastructure. You can even turn off servers when not in use, greatly saving costs.
  • It allows me to use the modern Ubuntu1 GNU/Linux operating system, Server Edition. Among Ubuntu’s many benefits are its user friendliness and ease of use. Software installations and upgrades are a breeze. That means less time is required to maintain the system while retaining the flexibility and power being a systems administrator gives.
  • EC2 has lower total cost ownership for me than most hosting providers’ virtual hosting or dedicated server plans. Shared (non virtual server) hosting is still cheaper, but no longer meets my sites’ requirements.2

Potential drawbacks/caution with EC2 & S3

  • While S3 is persistent storage, EC2 virtual server instances’ storage does not persist across server shutdowns. So if your web site is running a database and storing files on an EC2 instance, you should implement scheduled, automated scripts that regularly back up your database and your files to S3 or other storage.
    • Consistent with what I read in some comments online, my EC2 virtual server instance did not lose its file-system state or settings when I rebooted it. So rebooting seems to be safe.3
    • This potential drawback is arguably a good thing in some ways. It compels you to implement a good backup and recovery system.
    • This also means that after installing all the software on your running Amazon Machine Image (AMI), you should save it by creating a new AMI image of it as explained in the Creating an Image section of the EC2 Getting Started Guide.
      • This is an issue since you may want to do this every time after you update your software, especially with security patches. Until Amazon implements persistent storage for EC2 instances, you could do this monthly. You can script this to be partly or fully automated. Since Amazon’s EC2 instances are quite reliable, this is not a major concern.
  • An EC2 instance’s IP address and public DNS name persists only while that instance is running. This can be worked around as described under the tech specs section below.

Some articles about Amazon’s hosting infrastructure services:

Tech specs of my site:

  1. www.ubuntu.com []
  2. I plan to split rajiv.com into separate sites, The India Comedy site will move to comedy.rajiv.com and the SPV Alumni site will move to spv.rajiv.com. The latter two are community sites and will benefit from a community CMS like Drupal. []
  3. However, please be aware of a known issue that on some occasions caused instance termination on reboots. []
  4. I created my AMI virtual machine by building on top of a public Ubuntu AMI by Eric Hammond. []

The Amazon Kindle is a Practical & Excellent Reading Device (Product Review)

The Amazon Kindle is an excellent reading device. It is a good example of a product that serves its purpose well. Like a paper book, it you can use it in bed, bathroom, bus and boondocks.

The E Ink display presents an experience quite close to that of reading a book on paper: It is easy on the eyes and can be read indoors or outdoors in sunlight. It is lightweight, about the size of a paperback book and simple to use. It has excellent battery-life so you can enjoying read without worrying about recharging it often.

In some reviews people have criticized the Kindle as a Web browser and email client. As a satisfied user of the Kindle, I respectfully disagree with those criticisms: It is not meant to be a general purpose notebook computer or tablet PC nor a web browsing or email client. It is for reading books and documents, allowing you to focus on the content while providing you an experience equal to or better than reading on printed paper.

In fact, if Amazon had made web browsing too easy with it, it might end up being counter to the purpose of the Kindle, which is to read books: to learn or for the enjoyment of being engrossed in reading. Many of us prefer our book readers to not offer distractions like web browsing or email while reading. While I love the Web and its hypertext links, there are times I just want to focus and read a book or a document.

[amazon-product]B0015T963C[/amazon-product]The Kindle enables you to be more environment-friendly by saving trees. Many books, newspapers, magazines and blogs are already available on it. You can also transfer your own documents to it for convenient reading without printouts. (You email your document to a special automated email address for conversion. Amazon gives you two options to have it on your Kindle: One for a small fee where your document is wirelessly sent back to your Kindle, the other for free where they email it back to you and you need to copy it over from a computer to your Kindle using USB. You can also download free software to convert documents on your own computer.)

You save paper, yet still can carry your document to read in a convenient, lightweight, portable and easy to read medium. As a bonus, your document is searchable and your bookmarks, clippings, highlighting and notes can be transferred to your computer.

You don’t need a computer at all to take advantage of all the main features of a Kindle, but a computer does allow you to get even more value from your Kindle: You can use it instead of printouts and you can copy audio books and music to your Kindle for listening via its speakers or a headset.

As an educational tool, the Kindle comes with another useful and time-saving feature. You can ask a question using the Kindle which is answered by an Amazon-affiliated human researcher at no additional charge. The Kindle not distract your reading with Web browsing or email and it gives you a way to save some time which you would have spent researching yourself via Web searches — and we know how that can be: Sometimes you go the Web to look up something and end up wasting time on other things. With this Amazon Kindle’s research service called NowNow, you send your question via the Kindle and a human expert does the research for you and sends you the answers they find.

The Kindle can be charged using the A00 Tip with an iGo Adapter, which is great because you can carry it on hiking trips and to places where an electrical outlet is not conveniently available and charge it using two double-A batteries using the iGo powerXtender.

Want to take a break from Web surfing that encourages the attention span of a goldfish? Try the Kindle and enjoy being focussed and engrossed in a book. You can learn about all its features and benefits, watch videos and read its reviews at the Kindle page on Amazon.com.1

[rating:4/5]

  1. Note & Disclosure: the links to the Kindle pages in this article tell Amazon that I referred them. If you happen to buy it in that session, I will get a commission but they will not charge you any extra. I like the Kindle and thus wrote this favorable, and in my opinion, fair review. It expresses my opinions and shares my experience with the Kindle. The purpose of this review is not at all to profit from selling Kindles. []

Why I’m not a fan of fingerprint scanners for computer security

These days many notebook computers and portable devices like USB drives are featuring fingerprint scanners which they advertise as biometric security.

I’ve never been a fan of biometric security of this type. I will explain why using different scenarios:

Likely scenario: Stolen or lost laptop

If your laptop is lost or stolen, it is bound to have nice samples of all your fingerprints all over its nice smooth body. Picking up samples of your fingerprints from your laptop surface is much easier than cracking your password. A few internet searches or a visit to a detective/spy shop will provide the person in possession of your notebook computer or other lost gadget all they need to make copies of your fingerprints and create a mold that they can use to authenticate as you.

If your laptop had been secured with a password and encryption, they’d likely reformat your hard drive and your losses would be limited to your hardware. If a fingerprint scan was all that was required to gain access to your account, then your data, your privacy, not to mention your peace of mind for years to come will likely be stolen too.

Another scenario: Forced access to your computer

Let us consider another likely scenario without going into the cinema-like gruesome situation of a villain cutting off your thumb to forcibly access your computer. Say you are sleeping in a shared college dorm. Your roommate or a friend can bring your laptop near you and easily swipe your finger on it to gain access to all your files. You don’t even have to be unconscious. A person or gang stronger than you can easily overpower you without hurting you physically and swipe your finger on your computer to gain access.

Security Related Cartoon from XKCD

You see? This type of fingerprint scanning biometric security alone replacing passwords (instead of being used in combination with them) is a lot less secure than one would think. Such advertising of biometric security might seem impressive, but it leads to a false sense of security. In this day or digital privacy and identity theft, relying on such an insecure authentication alone is not a good idea.

As an aside, here is an interesting article on how fingerprint scanners work at HowStuffWorks.com.