Future of Content Management for News Media for Web sites

Content on Web sites should be managed using systems that were designed from the ground up for the Web. Traditional content management systems with a legacy of features and workflows used for paper-based print products like newspapers and magazines are unsuitable for Web sites. The future of news media content management for Web sites is in:

  • simple & quick workflows
  • blogs & wikis as the main content types for text
  • social networking & community publishing

simple & quick workflows

Complex editorial workflows make sense for print products (on paper) , where once the edition is done, the content and presentation state is “locked” and sent to the presses. Working with Web content writers and editors over the past decade, I have learned that simple, quick workflows are preferable for Web sites. Many Web site producers who hail from print backgrounds now share the same conclusion that complex content management is a hindrance to successful Web site production.

The concept of an edition of the entire product is not necessary for a content Web site. The atomic unit that can be managed and published together can be a package of articles and multimedia or even just one article. A Web site is a living, dynamic, ever changing collection of content where individual items can be updated whenever required or desired or even automatically based on usage.

To be competitive, content needs to be updated and published quickly. Corrections can be made anytime. Thus for Web sites, the editing and approval process should be streamlined and quick all the way from authoring to posting on the site.

A new concept

The ability of online word processors like Google Docs or WriteWith to enable multiple people to edit a document simultaneously and collaboratively is different paradigm from traditional check-in/check-out access control.

blogs & wikis as the main content types for text

Content management system (CMS) which offers the simplicity of blogs and are extensible via plug-ins to add functionality like WordPress or MovableType, make good foundations of a CMS for a news media Web site.

For revisions, editing history and access control, wiki software works well. WikiPedia and WikiNews, which are powered by the MediaWiki software are two good examples.

The concept of content management systems that combine the agility of blogs and editorial control of wikis is interesting to follow. The term bliki seems to be the leading classification of such products.

In many newsrooms, writers are increasingly using blog posts to publish news articles instead of their enterprise-class content management systems. When asked why, they reply because it is simpler and quicker and they don’t need the overhead of things like complex approvals, advanced version tracking and access controls.

social networking & community publishing

Managing content using a blog or wiki is social networking and community publishing activity. On the readership side, successful social news sites like Digg and Reddit have accelerated the evolution of journalism and readership habits towards the social/community model. The distinction between authors and readers itself is blurring with wikis and comments on blogs.

Social networking features are being added to a variety of Web sites. Going forward, expect to see social networking and community features in content management systems.

Conclusion

Media companies should move to using CMS products that prefer simplicity over complex editorial workflows which were a legacy of writing and editing for print products. A news item, story or blog post should be the same content type. It is likely that blogging products that have proven so successful in empowering talented individuals in competing with large companies will evolve into content management systems with the addition of wiki functionality.

Searching Instead Of Browsing: Organizing Information Using Labels as Meta-Data

Being able to assign labels to content to organize information for searching is superior to placing content in folders for manual browsing. The folder concept may be suitable to physical documents on paper, but does not lend itself well to digital information. The labels concept combined with an effective search capability is a faster way to organize content and find information.

Organizing content is a means to the end goal of finding information. Since organizing content is not a goal by itself, it should be as simple and less work as possible required to meet the goal of finding information.

The folder concept has many limitations:

  • A particular item of content can only belong to one folder. Placing it in two folders requires either:
    • Making duplicates. This is problematic to maintain.
    • Using links. This is problematic too: With ‘soft links’ the content resides in only one folder and if that folder is deleted, the content is deleted too. With ‘hard links’, it is hard to know how many ‘folders’ contain this content and unlinking the last one may unintentionally erase it.
  • Similarly, folders can only be contained within one folder.
  • To organize content well in folders requires deep levels of sub-folders. These can be a challenge to browse.
  • All content must be placed in a folder for it to be well organized in this scheme. Doing this manually is a burden. Setting up rules for some of the content to be automatically placed in folders relieves the burden to a certain extent. However, after a rule has run and placed a content item in a folder, if the rule was found to have been flawed and it mixed the content in with other content in the wrong folder, it can be a bigger burden to find the content and place it in the right folder.
  • Folders are static. Search results are dynamic. With computing power available to the common person growing, dynamic search makes better sense than static folders which put some of the work on the user rather than the computer.

It should not be mandatory to apply all appropriate labels to all content. If the automated content categorization being used employs techniques like artificial intelligence and pattern recognition and can determine that this article is about personal information management or content management then that particular label should not be mandatory.

As the number of labels grows, the labels should not be organized in a taxonomy tree with a folders/sub-folders structure. Such a tree structure has the problems of folders associated with it. The labels should be associated with each other in complex relationships as ‘concepts’ in a language.

For example, placing the label “computing” should return the content in search results for “technology”. Placing the label “personal information management” should find it in the search results for the concept “email”. Note that in a traditional taxonomy tree, “computing” could be a child of “technology”, but “personal information management” could be a parent of “email”.

However, since web page URLs as they are commonly used, especially on static-html sites, are based on the concept of folders, this is a challenge. Now URLs don’t have to be folder-like in their appearance. For example, all the news articles on a site could have URLs like “phillynews.com/ra23px4” instead of something like “phillynews.com/sports/ice_hockey/flyers/04-08-27-victory.htm” or “phillynews.com/inquirer/2004/08/27/sports/flyers-victory.htm”. In this fictitious example, “ra23px4” is an automatically generated, short and easy to type id pointing to the article like the shortcuts generated by services like tinyurl.com and metamark.net.

Let us consider the organization of email. It seems to be headed in this direction. Some examples in the email space are Google’s GMail, Microsoft’s LookOut Search Plugin for Outlook, Nelson Email Organizer (NEO).

Some possible labels for this document: “personal information management”, “content management”, “computing”, “technology”.

Preserving URLs of Evergreen Content

Changing the URLs of pages containing narrative content like articles has several disadvantages, especially for a content site:

  1. Readers’ bookmarks to the site’s pages break
  2. Links archived in electronic mediums (e.g. emails, documents) & print mediums (e.g. books, magazines, newspapers) to evergreen content1 like articles or news stories break
  3. Incoming links from other sites break
  4. Search engines drop the ranking of the pages
  5. It becomes harder for readers of the site to find content
  6. The site loses credibility with the readers
  7. The points above result in a significant loss of traffic to the pages, which in turn results in a loss of revenue

The idea of permanent links to content is gaining renewed popularity with blogs. Almost every blog entry has ‘permanent link to this item’ link.

Years ago, when I decided to move my web site from an html+cgi platform to a better dynamic web site platform, I selected Microsoft’s Active Server Pages (.asp). I was disappointed that all my content page URLs were going to have to change from the .html extension to .asp, but I reasoned it would be a one-time change. Going with Microsoft’s new standard seemed a safe bet, so I did :-(

A few years later, when the .NET platform came along, I was even more disappointed to learn that I’d have to change my content page URL extensions to .aspx. I figured that with the criticism MS has received with the change from .asp to .aspx, MS would settle on .aspx for good. So this time, going with the new MS standard was surely a safe bet, so I again began to slowly change my pages extensions again :-(

Now MS came up with yet another extension for file names in URLs, .mspx which is beginning to show up on some content pages at microsoft.com. Perhaps it is a sign to switch to a web application platform with stable URLs filename extensions like PHP or JSP. (The PHP developers listened to the user community when they tried to introduce the new .php3 filename extension and remained with .php.)

Yes, there are ways to preserve URL filename extensions while changing the underlying technology, but none of them is a good solution:

  • URL Rewriting. There are some URL rewriting engines on the IIS platform, but none is well-supported, strongly established in the market, or feature-rich like mod_rewrite on the Apache platform
  • Redirects. The way to do this correctly is via server configuration. On IIS sites at hosting providers, that is often not an option.
  • Mapping the old extension to the new technology. Since .asp, .aspx and .mspx pages are incompatible, it is impossible to slowly migrate the pages, a few at a time. This also results in an unsupported usage of the platform. Most hosting providers will not do this
  • Staying with a deprecated technology (keeping my pages .asp) is not an option either since that technology platform is on its way out and new features are not being added to it. Also, as a technologist, I don’t want my site’s pages to display an obsolete technology

The fact that microsoft.com’s own pages have been changing extensions from .asp to .aspx to .mspx is a sign that the way they have designed these technologies to not be backward compatible, sites will have to change their pages extensions.

Ideally, content publisher and readers should not have to deal with these issues. Perhaps I should use a URL rewriter and completely do away with url filename extensions on my site. Then I could have some pages as .asp, some as .aspx, some as .php and show readers only a uniform .htm extension (or no extension at all). Maybe I will move to PHP and do this as Michael Radwin at Yahoo suggests in his blog.

  1. evergreen content: pages expected to serve their purpose for a long time. []