Trinity Method of Technology Management

In the Trinity Method of Technology Management, tasks and responsibilities are categorized under three types of roles: Creator, Guardian and Recycler.

If you are the CTO or VP of Technology at an organization, your team needs to do three things effectively and regularly:

  1. Innovate; improve; create new products, features, services & processes
  2. Operate; maintain; execute existing processes & systems with predictable results
  3. Seek & identify products, features, services and processes that are no longer necessary; Decommission systems; Free up resources for reassignment

The above are the roles of creator, guardian and recycler, respectively.

An example of a creator-type manager is someone whose primary background is software engineering and that their strength is in delivering client satisfaction & happiness via innovative products & services.

A example of a guardian-type manager is someone who does a good job heading up technology operations.

The dedicated recycler-type role rarely exists in many organizations, resulting in unnecessary systems (whole or in part), features and processes consuming money, causing unnecessary complexity and slowing down productivity and innovation. Recycling should be a part of everyday work in a technology organization. Reduce waste by recycling.

There are many benefits of having a dedicated recycler role in your management team:

  • Higher productivity due to reduction of complexity, removal of obstacles and availability of freed-up resources
  • Helps eliminate or minimize ‘process creep’
  • A happier workplace resulting from the above
  • Cost savings

I recommend that you have these three distinct roles, with a manager focussed on only one of creator, guardian, or recycler type tasks & responsibilities at a given time.

The table below gives some examples of tasks and responsibilities under the three areas.

Creator Tasks & Responsibilities Guardian Tasks & Responsibilities Recycler Tasks & Responsibilities
Develop new products, functionality, services, systems & processes Operations, execution, delivering predictable results, maintenance & support Examine existing systems, products, processes and resource assignments seeking areas for recycling
Add a major new feature to an existing Web application Track expenses to budget, monthly Decommissioning a system no longer in use
Develop a new mobile application Compile status reports, weekly Elimination of unnecessary steps and waste in a process or workflow
Mentor and coach employees on a regular basis Identification of areas for cost reductions
Review and approve requests like vacations, expenses and When an employee leaves, don’t immediately assume that you need to fill the position. The recycler manager should urge the team to determine if this work can be absorbed elsewhere. This will help eliminate waste and avoid or minimize layoffs in the future when business requires reducing staff.

This article was inspired by the Indian concept of Trimurti in which in which the cosmic functions of creation, maintenance, and destruction are personified. It was also inspired by the Harvard Business Review article titled “What 17th-Century Pirates Can Teach Us About Job Design” by Hayagreeva Rao, Professor of at Stanford University’s Graduate School of Business.

This post about the Trinity Method of Technology Management is part of a series on technology leadership & management.

Benefits of Using IRC or Group Chat & Video Conference During Incident Management

When a team of engineers is dealing with a real-time incident, such as a system outage, troubleshooting a problem or dealing with a malicious hacking attack, having excellent communications is critically important. The appropriate communications tool can make a world of difference in dealing with the issue and learning from it afterwards. As important as the engineering work itself is, lack of good communications is what often gets tech teams in trouble.

You should enable real-time communication in certain collaborative tasks. This will reduce unnecessary email traffic and clutter, enable people to to focus better on their tasks,and minimize time wasted in bringing each other up to speed When multiple people are working together in real-time on a near term collaborative task, such as:

  • Crisis Management
  • Troubleshooting
  • Dealing with hacking attacks
  • Build and deployment
  • Web application migration
  • Upgrade or maintenance
  • QA testing

Many companies use a phone conference and/or email to assist in real-time while the collaborative activity above is ongoing. Since Email is not instantaneous and real-time the way a group chat application is, and since email is not a suitable medium for quick questions, and quick one-line responses, smart teams use a real-time group chat tool like IRC (Internet Relay Chat) to enable and facilitate real-time conversation. Benefits of using IRC or a real-time textual group chat tool instead of email are:

  • Tech managers, project managers, crisis managers and new tech people joining the effort can quickly catch up with what has been going on (in any level of detail they want) by reading the IRC history transcript so far. This is a much faster and efficient way than using email or pulling someone away to talk in person asking what has been going on. (If email were to be used instead of IRC, a new person joining in would have missed the previous emails on the topic.)
  • When an engineer working on such a collaborative task steps away for a while and comes back, they can quickly catch up on what transpired while they were away by reading the IRC history transcript.
  • Email is not cluttered by short back and forth messages with lots of text to read and filter
  • The IRC transcript can be used for the post incident retrospective and report (“post-mortem”).
  • Unlike a phone-only conference, the IRC transcript can be read and analyzed to learn lessons from this incident. For example:
    • Analyze what problems the team ran into
    • Analyze what worked and what didn’t
    • Analyze how well people collaborated and communicated
    • Timelines of events

I can personally attest to the above benefits. Over the past 15+ years, my development and operations teams in different companies have regularly used IRC to great advantage. Tools like Wikis and blogs are great for collaboration, documentation and sharing information on projects. An group chat like IRC is an indispensable tool for real-time collaboration.

2013 August Update:

With multi-participant video conferencing becoming commonplace thanks to Google Hangouts, I have updated this post to include video conferencing combined with group text chat.

The rest of this update has moved to its own blog entry titled ‘What I Learned During the Hacking Attacks of August 28, 2013.’

2014 July Update:

I now recommend organizations to consider using Slack. See my 2014 July comment below.

Opinion on the Amazon S3 Outage; Checklist for Dealing with Outages

My journalist colleagues at Wired.com published some of my comments related to Amazon S3.1 Wired also posted another article titled Customers Shrug Off S3 Service Failure. I agree with the views of many of the customers expressed in the article. Don MacAskill, CEO of the popular photo hosting site Smugmug, wrote an understanding post about it.

My entire career working for media companies, I’ve held firm the belief that the uptime, reliability, performance, scalability, performance and security of commercial Web sites is of paramount importance. When sites that I’ve been responsible for have had issues, my colleagues and I have given our personal time and energy to resolution. With my teams, I spend considerable time on proactive measures. I’ve had the honor of working closely with and learning from some who do an excellent job running technology operations.

Experience has taught that things can and sometimes do go wrong. Sometimes calculated risks don’t pan out. Sometimes mistakes cause problems. We are human. We should strive for perfection; we can get close to it, but not fully attain it. We should be prepared for such scenarios. When they happen, we should work diligently and expeditiously on resolution and have frequent and honest communications with stakeholders and customers. Such communications during the incident should include:

During-Incident Communication Checklist

  • Current status
  • What is the full impact?
  • Estimated time to resolution
  • Any recommended workarounds until resolution, if practical
  • Assurance that it is being worked on
    • It often helps to mention who all are working on it and what they are doing

The post-incident communications to stakeholders and customers should include:

Post-Incident Communication Checklist

  • Summary
  • What happened, how and why it happened?
    • Including full description of all impact
    • Do not blame2 third-parties or say things like “beyond our control”. A technology leader takes responsibility equally for both insourced and outsourced products and services.3
  • How it was resolved
    • If the resolution is temporary or long-term
  • Next steps
  • Plan for eliminating or minimizing this and similar incidents from happening again
  • Thank all those who helped resolve and the customers for their understanding
  • Mention the monetary credits you plan to give as per the Service Level Agreement (SLA)
    • Specify any additional ‘make goods’ or returns you plan to make to the customers above and beyond the credits as per SLA, if appropriate.
  • Double check each recipient’s email address to make sure you are sending this memo which may contain confidential information to the correct person and not someone else with a similar name in your address book. You don’t want your memo published on Gawker.
  • Speaking of Gawker, in the event someone does leak your memo outside the beyond the intended recipients, take care to not say anything in it that would be an embarrassment. That’s another reason to be honest, own the problem and solution, and not pass the blame.

Stakeholders and customers here refer to internal customers of the technology operations team (e.g. the concerned folks in editorial, marketing, sales, finance, legal and other departments). External communications to the public Internet should be handled in consultation with legal and public relations.

S3’s outage (or any outage) isn’t to be taken lightly, but I have faith Amazon and their customers will learn from it.

Disclaimers:

  • As explained in the terms of use of this site, any opinions expressed on my personal Web site do not reflect those of any employer, past or present. My Web site and I in my personal life neither represent nor speak for any corporation.
  • I have no affiliation, financial or otherwise with Amazon.com. I happen to be a user of their products and services, some of which I like and some that I don’t.
  • Personal Web sites like this are exempt from the performance requirements of corporate Web sites :-) My personal Web site is for expressing, learning and R&D. It also happens to be hosted on Amazon EC2 and S3.
  1. Silicon Alley Insider and ValleyWag have amusing spins on it. :-) []
  2. There may be extreme instances, especially when criminal activity or malicious wrongdoing was the cause where it would be appropriate to blame someone. []
  3. It is ok to mention service providers, or describing external events for explaining what happened, but don’t do it in a “it was their fault, not ours” tone. The technology leader should factually describe what happened and take responsibility. []