Being able to assign labels to content to organize information for searching is superior to placing content in folders for manual browsing. The folder concept may be suitable to physical documents on paper, but does not lend itself well to digital information. The labels concept combined with an effective search capability is a faster way to organize content and find information.

Organizing content is a means to the end goal of finding information. Since organizing content is not a goal by itself, it should be as simple and less work as possible required to meet the goal of finding information.

The folder concept has many limitations:

  • A particular item of content can only belong to one folder. Placing it in two folders requires either:
    • Making duplicates. This is problematic to maintain.
    • Using links. This is problematic too: With ’soft links’ the content resides in only one folder and if that folder is deleted, the content is deleted too. With ‘hard links’, it is hard to know how many ‘folders’ contain this content and unlinking the last one may unintentionally erase it.
  • Similarly, folders can only be contained within one folder.
  • To organize content well in folders requires deep levels of sub-folders. These can be a challenge to browse.
  • All content must be placed in a folder for it to be well organized in this scheme. Doing this manually is a burden. Setting up rules for some of the content to be automatically placed in folders relieves the burden to a certain extent. However, after a rule has run and placed a content item in a folder, if the rule was found to have been flawed and it mixed the content in with other content in the wrong folder, it can be a bigger burden to find the content and place it in the right folder.
  • Folders are static. Search results are dynamic. With computing power available to the common person growing, dynamic search makes better sense than static folders which put some of the work on the user rather than the computer.

It should not be mandatory to apply all appropriate labels to all content. If the automated content categorization being used employs techniques like artificial intelligence and pattern recognition and can determine that this article is about personal information management or content management then that particular label should not be mandatory.

As the number of labels grows, the labels should not be organized in a taxonomy tree with a folders/sub-folders structure. Such a tree structure has the problems of folders associated with it. The labels should be associated with each other in complex relationships as ‘concepts’ in a language.

For example, placing the label “computing” should return the content in search results for “technology”. Placing the label “personal information management” should find it in the search results for the concept “email”. Note that in a traditional taxonomy tree, “computing” could be a child of “technology”, but “personal information management” could be a parent of “email”.

However, since web page URLs as they are commonly used, especially on static-html sites, are based on the concept of folders, this is a challenge. Now URLs don’t have to be folder-like in their appearance. For example, all the news articles on a site could have URLs like “phillynews.com/ra23px4″ instead of something like “phillynews.com/sports/ice_hockey/flyers/04-08-27-victory.htm” or “phillynews.com/inquirer/2004/08/27/sports/flyers-victory.htm”. In this fictitious example, “ra23px4″ is an automatically generated, short and easy to type id pointing to the article like the shortcuts generated by services like tinyurl.com and metamark.net.

Let us consider the organization of email. It seems to be headed in this direction. Some examples in the email space are Google’s GMail, Microsoft’s LookOut Search Plugin for Outlook, Nelson Email Organizer (NEO).

Some possible labels for this document: “personal information management”, “content management”, “computing”, “technology”.

Changing the URLs of pages containing narrative content like articles has several disadvantages, especially for a content site:

  1. Readers’ bookmarks to the site’s pages break
  2. Links archived in electronic mediums (e.g. emails, documents) & print mediums (e.g. books, magazines, newspapers) to evergreen content1 like articles or news stories break
  3. Incoming links from other sites break
  4. Search engines drop the ranking of the pages
  5. It becomes harder for readers of the site to find content
  6. The site loses credibility with the readers
  7. The points above result in a significant loss of traffic to the pages, which in turn results in a loss of revenue

The idea of permanent links to content is gaining renewed popularity with blogs. Almost every blog entry has ‘permanent link to this item’ link.

Years ago, when I decided to move my web site from an html+cgi platform to a better dynamic web site platform, I selected Microsoft’s Active Server Pages (.asp). I was disappointed that all my content page URLs were going to have to change from the .html extension to .asp, but I reasoned it would be a one-time change. Going with Microsoft’s new standard seemed a safe bet, so I did :-(

A few years later, when the .NET platform came along, I was even more disappointed to learn that I’d have to change my content page URL extensions to .aspx. I figured that with the criticism MS has received with the change from .asp to .aspx, MS would settle on .aspx for good. So this time, going with the new MS standard was surely a safe bet, so I again began to slowly change my pages extensions again :-(

Now MS came up with yet another extension for file names in URLs, .mspx which is beginning to show up on some content pages at microsoft.com. Perhaps it is a sign to switch to a web application platform with stable URLs filename extensions like PHP or JSP. (The PHP developers listened to the user community when they tried to introduce the new .php3 filename extension and remained with .php.)

Yes, there are ways to preserve URL filename extensions while changing the underlying technology, but none of them is a good solution:

  • URL Rewriting. There are some URL rewriting engines on the IIS platform, but none is well-supported, strongly established in the market, or feature-rich like mod_rewrite on the Apache platform
  • Redirects. The way to do this correctly is via server configuration. On IIS sites at hosting providers, that is often not an option.
  • Mapping the old extension to the new technology. Since .asp, .aspx and .mspx pages are incompatible, it is impossible to slowly migrate the pages, a few at a time. This also results in an unsupported usage of the platform. Most hosting providers will not do this
  • Staying with a deprecated technology (keeping my pages .asp) is not an option either since that technology platform is on its way out and new features are not being added to it. Also, as a technologist, I don’t want my site’s pages to display an obsolete technology

The fact that microsoft.com’s own pages have been changing extensions from .asp to .aspx to .mspx is a sign that the way they have designed these technologies to not be backward compatible, sites will have to change their pages extensions.

Ideally, content publisher and readers should not have to deal with these issues. Perhaps I should use a URL rewriter and completely do away with url filename extensions on my site. Then I could have some pages as .asp, some as .aspx, some as .php and show readers only a uniform .htm extension (or no extension at all). Maybe I will move to PHP and do this as Michael Radwin at Yahoo suggests in his blog.

  1. evergreen content: pages expected to serve their purpose for a long time. []

BeanShell is a fully Java compatible scripting language, capable of interpreting ordinary Java source files. You can also use it for working with Java interactively like an interpreted Unix Shell or Perl. You can try out Java’s object features, APIs, GUI widgets and other libraries hands on.

BeanShell is free and also ships bundled with popular applications such as BEA Weblogic, Forte for Java and the NetBeans IDE.

Can’t find what you are looking for using Google? There are other search engines too. For specific searches, some of these may have their own unique advantages over Google. Google is still great too, but isn’t the only option around anymore.

Update: 2008-Feb-02: The above list is now managed as WordPress blogroll links using a plugin called Blogroll Links. So as the Web search engine landscape changes, I can keep the above list current.

Search, when effectively integrated with content, creates a combination that is greater than the sum of the two separately.

Let us consider an example.

A printed phone book has been available to people for decades. The information in it was accessible primarily for one intended purpose: search by name for phone number and home address. Accessing it differently (e.g. search by phone number for home address and name) was practically impossible for most people, even though the information was all there in the phone book. When the same phone book — the exact same content is made searchable via a computer, it raises privacy issues. When the same computer assisted search is made easily available to millions over the Internet, it raises serious privacy concerns. Notice the content didn’t change, but adding search-ability to the content transformed the content into something more powerful.

Search technology is a powerful enabler.

I do not view content and search technology as two separate entities that can be put together to provide better information. Many web sites do this and that is one of the key reasons why their site search is ineffective.

Search is most effective when it is intimately integrated with the content.

Content should not be considered merely a block of text or data. It should be considered an object: a combination of data and functionality. This is similar to an object in the computer science term object-oriented. The search-ability should not be external to the content, but the content itself should be search-enabled. Besides text and data, the content object should include both headers and in-line meta-data to be searchable better. A better form of content is one that has search-enablement built into it or integrated with it. This search-enablement could be programmable code, rule sets, meta-data or a combination.

Let us discuss an example to illustrate this.

A news media site has many types of content in it. Let us consider two of them: news articles and movie listings. When an external search engine such as Google brings back results from such a site, it does not effectively differentiate between these two types of content. For an external search engine, they are just web pages.

It would be better if the search engine employed on the site had an understanding of the types of content and searched it differently. Some sites such as the new Yahoo Search and C|Net do a fairly good job at this when they bring back search results from different types of content repositories.

When someone searches for “digital cameras”, the following types of content are of relevance to them: product information, product reviews, product storefront. This is because someone looking to buy a digital camera would like to know more about digital cameras, would like to read reviews of different digital cameras and would like to find a place where they can buy one. A search engine that treats all these different types of content the same — as web pages — isn’t very effective. An effective search solution groups these different types of search results for better access.

Search would be even more useful, if the content itself (being object oriented) knew how to interact with the search engine.

This could be achieved using adapters for each type of content. The search engine would talk to the adapter, which would be intimately integrated with the content. This would result in the content (via the adapter) responding differently to different types and combinations of search queries. It would result in a very useful and powerful search for the users.

At a news media site, examples of adapters would include: news article adapter, movie listing adapter, classified ad adapter, etc. These adapters would be implemented using an object-oriented programming language such as Java or C#. Search-ability would be just one aspect of these adapters’ functionality. They would be provide a wrapper around the content and provide functionality like accessing and editing the content. An example technical design of these adapters is the subject of my article for a technical audience.

Your feedback on this article will be greatly appreciated. Please share it with me via the contact page.

November 24th was the day I got the freedom to switch my phone service from AT&T Wireless back to Verizon Wireless and this time without changing my number. (I had been a happy Verizon customer in the past and had gone to AT&T for their lower rates.)

Verizon’s mobile phone network in the U.S. has the best coverage. Where their digital network does not reach, their analog network does. Verizon’s customer service is excellent. I’ve always reached a customer service agent within 2 minutes and they have been extremely helpful. AT&T GSM, on the other hand, has poor coverage and unreliable voicemail. Sometimes, my phone has run out of battery waiting for an AT&T Wireless customer service agent. Even when I do reach them, they have rarely been helpful.

When I requested to switch, AT&T would not release my number to Verizon for two days citing “technical problems”. I’m expecting AT&T to relinquish my number this Friday (after 5 days of several calls to both companies). In the meantime, Verizon has been most helpful and I’m able to use my Verizon phone to make outgoing calls.

CDMA technology that Verizon uses is superior to and has many advantages over GSM and TDMA. One of the unmentioned benefits is that if you lose your phone, you can easily switch your number to another phone by calling customer service. There is no SIM card to worry about which is great for travelers and outdoor enthusiasts. If you like to do outdoor things and need mobile phone coverage in remote places, get a Verizon phone with analog roaming support.

(In case you live in another country and were wondering about the headline, click here.)