Integrating Legacy Technologies With Web Systems at Newspapers

The topic of integrating print technology systems with web technology systems often comes up in the newspaper, magazine and book publishing industries.

There is a key difference between Content Companies (e.g. newspapers, magazines) and Other Companies (e.g. pharmaceuticals). With the World Wide Web and information technology (IT) becoming part of everyday life, every company is becoming a content company in certain ways.

In the case of other companies like pharmaceuticals, aeronautics, construction, etc. their pre-digital products are not going away nor changing as drastically as a result of the world wide web and IT as is happening in the case of content companies like newspapers and magazines.

For those other companies, it makes sense to integrate the web systems like content management with their core products because their other core products are not fading away as a result of the web and IT.

However, in print media companies like newspapers whose legacy has been printing systems, their product in its printed form is fading away as a direct result of the Web and IT. So for them it may make sense to not spend too much effort on integrating legacy print systems with Web systems. Instead, it may be a better strategy to spend more resources on enhancing and upgrading the Web systems and digital media products. So for newspapers today, the 1990s holy grail of having one seamless print+web content management system may be less relevant in 2007. It may actually make better business sense to to keep the print publishing system and Web CMS separate, focus more on Web and digital media and allow the printed on paper versions of their products to gradually retire over the next two decades.

Project Management: Time to Market, People & Teamwork

Starting early, not driving recklessly fast

People who have worked with me are familiar with my trait of challenging the team to bring products and solutions to market as soon as possible. I’m a strong proponent for quickness to market and love to deliver sooner than the initially projected timeline. In this article, however, I present a different viewpoint for balance.

In product development, the question often comes up: How can we be quicker and faster to market with our products? We should ask instead: How can we be earlier to market with our products than our competitors? We should also ask: Is it more important to be early, or to deliver good quality and innovation?

For the medium and long term good of your organization and in the best interest of your customers, it is more important to deliver a high quality and innovative product than to deliver it quicker.

In most cases, successful companies are not the ones who are fast or early to deliver products, but those that deliver better products.

Take Google for example. They were a couple of years late to the Web search engine market and were reinventing a product that had already been established by others. Many thought the search engine market was already saturated. Remember some of the early ones like Infoseek, Lycos? Where are they now? Consider Microsoft and Apple: most of their products are not early, but they often succeed. The iPod came years after the early portable digital audio players. MySpace.com came up to dominate online social networking a couple of years after Friendster, Tribe and Orkut were already established.

Even when analyzing products whose success was due their being early to market, we find that early does not imply fast. These projects often started early and were executed at a comfortable, smooth pace.

As the saying goes, when you ask for quick and dirty, you get both. The benefits of speed to market are for the short term. In some cases, it does make sense to go for quick, short-term solutions. In all cases, however, one must give serious thought to whether that’s the correct path to choose considering the medium and long term goals.

People & Teamwork

In projects, working fast is often a recipe for failure, especially after starting late. The overwhelming majority of projects are not like 100 meter races, where speed results in victory. They are like football games, where factors like teamwork have much greater influence on winning.

The greatest factor affecting the success of projects is not speed, not technology, not even process or planning. It is people. Invest your time, energy and resources on your people and they will make your projects succeed more than anything else.

Whether you are a leader, manager or information worker, I recommend learning more about the people factor and practicing better people related activities at work. Here is a quote I like from a book: “People under time pressure don’t work better; they just work faster. In order to work faster, they may have to sacrifice the quality of the product and their own job satisfaction.” — Peopleware: Productive Projects and Teams, 2nd Edition, Tom DeMarco and Timothy Lister

Keep in mind this order of descending significance of factors in projects’ success:

  1. People & teamwork
  2. Priorities
  3. Planning
  4. Process & operations
  5. Products & technologies
  6. Pace & acceleration

Technology, Innovation and Business Decisions

Nowadays, it is becoming fashionable to belittle technology. I hear people say things like “technologies should not drive business decisions“, “define your requirements without worrying about technology and ask technology [people] to deliver them“. May sound logical at first, but is it? Consider this imaginary conversation between Bill, a technologist and Plato, a Platonist businessperson.

Bill: How do you commute to work?
Plato: I drive.
Bill: Why don’t you fly or teleport?
Plato: What do you mean?
Bill: Why do you use a car that drives on the road, why not a personal flying machine or a teleporter?
Plato: Because they aren’t invented yet.

Defining business needs without consideration of technology is impractical. It is being quixotic and ignoring the current reality. Leave the dreaming for the innovators since they and businesspeople are usually different people.

Think about ten important inventions that changed the world. How many of them were created with a business plan? How do you think the wheel was invented? How was fire discovered? Did someone create this World Wide Web with a business plan? Many important things were created with a business plan. The point here is not that business is unimportant, but that technology is important on its own merit.

Sound business practices have an important place in this world. Technology and innovation have a place in this world. One is not master to the other.

Searching Instead Of Browsing: Organizing Information Using Labels as Meta-Data

Being able to assign labels to content to organize information for searching is superior to placing content in folders for manual browsing. The folder concept may be suitable to physical documents on paper, but does not lend itself well to digital information. The labels concept combined with an effective search capability is a faster way to organize content and find information.

Organizing content is a means to the end goal of finding information. Since organizing content is not a goal by itself, it should be as simple and less work as possible required to meet the goal of finding information.

The folder concept has many limitations:

  • A particular item of content can only belong to one folder. Placing it in two folders requires either:
    • Making duplicates. This is problematic to maintain.
    • Using links. This is problematic too: With ‘soft links’ the content resides in only one folder and if that folder is deleted, the content is deleted too. With ‘hard links’, it is hard to know how many ‘folders’ contain this content and unlinking the last one may unintentionally erase it.
  • Similarly, folders can only be contained within one folder.
  • To organize content well in folders requires deep levels of sub-folders. These can be a challenge to browse.
  • All content must be placed in a folder for it to be well organized in this scheme. Doing this manually is a burden. Setting up rules for some of the content to be automatically placed in folders relieves the burden to a certain extent. However, after a rule has run and placed a content item in a folder, if the rule was found to have been flawed and it mixed the content in with other content in the wrong folder, it can be a bigger burden to find the content and place it in the right folder.
  • Folders are static. Search results are dynamic. With computing power available to the common person growing, dynamic search makes better sense than static folders which put some of the work on the user rather than the computer.

It should not be mandatory to apply all appropriate labels to all content. If the automated content categorization being used employs techniques like artificial intelligence and pattern recognition and can determine that this article is about personal information management or content management then that particular label should not be mandatory.

As the number of labels grows, the labels should not be organized in a taxonomy tree with a folders/sub-folders structure. Such a tree structure has the problems of folders associated with it. The labels should be associated with each other in complex relationships as ‘concepts’ in a language.

For example, placing the label “computing” should return the content in search results for “technology”. Placing the label “personal information management” should find it in the search results for the concept “email”. Note that in a traditional taxonomy tree, “computing” could be a child of “technology”, but “personal information management” could be a parent of “email”.

However, since web page URLs as they are commonly used, especially on static-html sites, are based on the concept of folders, this is a challenge. Now URLs don’t have to be folder-like in their appearance. For example, all the news articles on a site could have URLs like “phillynews.com/ra23px4” instead of something like “phillynews.com/sports/ice_hockey/flyers/04-08-27-victory.htm” or “phillynews.com/inquirer/2004/08/27/sports/flyers-victory.htm”. In this fictitious example, “ra23px4” is an automatically generated, short and easy to type id pointing to the article like the shortcuts generated by services like tinyurl.com and metamark.net.

Let us consider the organization of email. It seems to be headed in this direction. Some examples in the email space are Google’s GMail, Microsoft’s LookOut Search Plugin for Outlook, Nelson Email Organizer (NEO).

Some possible labels for this document: “personal information management”, “content management”, “computing”, “technology”.

Preserving URLs of Evergreen Content

Changing the URLs of pages containing narrative content like articles has several disadvantages, especially for a content site:

  1. Readers’ bookmarks to the site’s pages break
  2. Links archived in electronic mediums (e.g. emails, documents) & print mediums (e.g. books, magazines, newspapers) to evergreen content1 like articles or news stories break
  3. Incoming links from other sites break
  4. Search engines drop the ranking of the pages
  5. It becomes harder for readers of the site to find content
  6. The site loses credibility with the readers
  7. The points above result in a significant loss of traffic to the pages, which in turn results in a loss of revenue

The idea of permanent links to content is gaining renewed popularity with blogs. Almost every blog entry has ‘permanent link to this item’ link.

Years ago, when I decided to move my web site from an html+cgi platform to a better dynamic web site platform, I selected Microsoft’s Active Server Pages (.asp). I was disappointed that all my content page URLs were going to have to change from the .html extension to .asp, but I reasoned it would be a one-time change. Going with Microsoft’s new standard seemed a safe bet, so I did :-(

A few years later, when the .NET platform came along, I was even more disappointed to learn that I’d have to change my content page URL extensions to .aspx. I figured that with the criticism MS has received with the change from .asp to .aspx, MS would settle on .aspx for good. So this time, going with the new MS standard was surely a safe bet, so I again began to slowly change my pages extensions again :-(

Now MS came up with yet another extension for file names in URLs, .mspx which is beginning to show up on some content pages at microsoft.com. Perhaps it is a sign to switch to a web application platform with stable URLs filename extensions like PHP or JSP. (The PHP developers listened to the user community when they tried to introduce the new .php3 filename extension and remained with .php.)

Yes, there are ways to preserve URL filename extensions while changing the underlying technology, but none of them is a good solution:

  • URL Rewriting. There are some URL rewriting engines on the IIS platform, but none is well-supported, strongly established in the market, or feature-rich like mod_rewrite on the Apache platform
  • Redirects. The way to do this correctly is via server configuration. On IIS sites at hosting providers, that is often not an option.
  • Mapping the old extension to the new technology. Since .asp, .aspx and .mspx pages are incompatible, it is impossible to slowly migrate the pages, a few at a time. This also results in an unsupported usage of the platform. Most hosting providers will not do this
  • Staying with a deprecated technology (keeping my pages .asp) is not an option either since that technology platform is on its way out and new features are not being added to it. Also, as a technologist, I don’t want my site’s pages to display an obsolete technology

The fact that microsoft.com’s own pages have been changing extensions from .asp to .aspx to .mspx is a sign that the way they have designed these technologies to not be backward compatible, sites will have to change their pages extensions.

Ideally, content publisher and readers should not have to deal with these issues. Perhaps I should use a URL rewriter and completely do away with url filename extensions on my site. Then I could have some pages as .asp, some as .aspx, some as .php and show readers only a uniform .htm extension (or no extension at all). Maybe I will move to PHP and do this as Michael Radwin at Yahoo suggests in his blog.

  1. evergreen content: pages expected to serve their purpose for a long time. []