logo

Chapter Summary 
Titanic Effect, “The severity with which an on-line operation fails is directly proportional to an organizations belief that it cannot. In other words technology plus arrogance spells disaster”

ooo project life cycle

The project life-cycle consists of 6 stages. Each chapter covers one stage.


Stage 1 in the project life-cycle is "Defining your strategy"









 
Stage 2 in the project life-cycle is "Mapping your strategy"






















 
Stage 3 in the project life-cycle is "Constructing your goods"





























 
Stage 4 in the project life-cycle is "Planning your test"
























Stage 5 in the project life-cycle is "Testing you plan"



























 
Stage 6 in the project life-cycle is "Delivering your goods"


































 
 
 
There is no stage 7 as the first iteration of the project is now complete. This chapter covers the period when the solution is in operation (production)


























This chapter is a summary chapter




Avoiding Project Disaster
Each chapter of the book is summarized in three parts which includes a description of the activities related to that stage in the project life-cycle, the historical case study, and project best practices.
  1. “The rush to win passengers” - The chapter takes a look at why organizations establish on-line business services and mission critical service delivery environments in the first place. The chapter defines the importance of establishing business requirements in the evolution of a business service (new solution), and the availability characteristics to look for in a business service. Establishing a business case for the loss of service is critical as this simplifies defining the availability requirements in subsequent stages. Business executives and managers need to be fully aware of the investment required for availability. They also need to know the expected level of service. The chapter continues by defining the “dimensions” of a business service, which help explain the various intangibles that create a service. These are used in identifying meaningful metrics for availability, and setting up Service Level Agreements.

    The Titanic case study examines the origins of the Titanic project, how it was conceived and business drivers behind White Star’s decision and business strategy to win passengers across the 3 classes. This includes a background to liners 1900-1914, competition, and the business rationale of comfort over speed. The study outlines a sample business case with a cost/benefit analysis for the project.

     
    Best Practices section looks at business service metrics, the importance of “User Outage Minutes”, (UOMs), and measuring availability from the customer's perspective. It then looks at the significance of evolving business services with an understanding of what the loss of services would mean to the organization. This requires creating a business case to accurately establish the potential lost revenue. The organization can then start to define mitigating strategies and where to apply resources to limit the impact of unavailability. It contemplates how to take advantage of outages and technology failures to determine the true downtime costs and re-evaluate the value of technology within the service delivery environment, i.e., what is and is not mission critical. The reader can then use two of the software tools  included with the book to calculate UOMs, the true costs of an outage, and a business case for service availability.

  2. “Life-boat in itself” - The chapter examines architecting and designing high availability into a service delivery environment that meet the business and functional requirements. It also reviews how these translate into the technology and implementation requirements. Business executives and managers need to understand the risks associated with the service delivery environment. This includes current levels of protection, potential environmental dependencies and their impact.

    The Titanic case study looks at the design and the choice of safety features available at the time. It examines how competitive business pressures led to a spotlight on luxury and splendour over everything else in the quest to design a palatial hotel. The lavish attention and investments paid to passenger comfort implied there was an equivalent investment in the safety and operations features.

    Best Practices introduces functional models that describe mission critical environments, e.g., Environmental Architecture, Inventory, Critical Areas, and Transactional Flow. From these the project can identify, to the component level, the availability requirements. This is more feasible than trying to achieve high availability across the whole service delivery environment which is difficult, expensive, and unnecessary. These models provide a rapid way to evaluate the risks within the environment. They crystallize the objectives of the project and influence the project team through the later stages. High availability strategies can then be “architected” into the environment based on mitigating risk and protecting key applications. These models also set business expectations of what the likely investments will be to achieve the desired levels of availability, the incremental cost of increased up-time, and where to make the investments.

  3. “Quest to build a palatial hotel” - The chapter takes a look at constructing a working technology that can further demonstrate the functionality and the availability requirements for the business services. The output is a working prototype typically a working version of the solution. Timing defines the sophistication and completeness of the prototype, which provides a very useful tool to present to the business user or recipient of the solution. It confirms the proposed functions and features early enough for any changes to be made at a lower cost. Business executives and managers need to understand any risks associated with a solution, and have confidence in the constructor. They also need to understand the availability features of the solution and the levels of protection.

    The Titanic case study looks at the construction techniques and compares the selection issues of proven versus new technology. Safety features were built in however this created an over confidence that nothing could go wrong. In fact, there was an unshakable belief in the safety of the ship, a lifeboat in itself. As a result, as the construction approached completion esthetic factors were allowed to compromise the safety features and the design was fundamentally flawed in a number of areas. For example, the height of the bulkhead walls was too short, the double skin bottom was under the water line, and the ship carried the minimum number of lifeboats based on regulations. The study also highlights how maritime legislature was hopelessly outdated by the rapid evolution of shipping technology. The whole construction effort now seems very misdirected.

    Best Practices section examines how construction in today’s environments principally consists of integrating technologies, and using off the shelf products and solutions. It continues reviewing a number of techniques for improving the availability of an identified critical component (the output from the architecture and design stage), which may cause the greatest problems if unavailable. These techniques are also used for constructing availability into solutions, e.g., check-pointing, auditing, redundancy, etc. This also includes looking at high availability-advantages, disadvantages and best circumstances for each technique to increase up-time.

  4. “Those who fail to plan, plan to fail” - The chapter examines the integrity, resilience and reliability of the end solution and preparing for implementation into the service delivery environment. In this stage the change management structure is developed and used to evaluate how closely the business criteria should be met by the services provided. This presents a basis for user acceptance to begin. By the end of this stage the solution is ready for pre-production testing. Business executives and managers need to understand the risks associated with the incoming change and the potential impact to current business services.

    The Titanic case study looks at the testing or sea trials undertaken, specifically the limited operational and safety testing. Only one lifeboat drill was performed the outcome of which underlined the poor operational readiness of the ship. With the Olympic already established in service extensive sea trails and testing were not considered as critical and the pressure was on to get the Titanic into operation.

    Best Practices section introduces the requirements for sound change management. This is not just for major projects but for the implementation of any kind of change, into the service delivery environment. It outlines a comprehensive 2-phase change management methodology of “planning and controlling” through a 9-step change model. Change planning covers the need to adequately assess the risk and determine an appropriate change strategy to maximize the efficiency and minimize the duration of the testing. The majority of recorded outages are related to inadequacies in Change Management structures. This includes planning for the level of testing required and selecting the right kind of tests, e.g., from a battery of up to 17 tests including Integration, Security, Stress, Load, Functional, Operational, and Simulation.

  5. “A chain is as strong as the weakest link” - The chapter takes a look at successfully implementing the service into production, with the least possible risk, and meeting the service delivery criteria. In this stage the change management structure is tested for the first time and the user acceptance is completed. By the end of this stage the solution is ready to be implemented into the service delivery environment. There should be a high degree of confidence within the operations that no disruption will occur. Business executives and managers need to know the outcome of the testing, the business risk of the implementation. They also need to know how safe the solution is and the risk of going live.

    The Titanic case study examines how Titanic’s sister ship, the Olympic, had gone into dry dock for repairs following its collision with the HMS Hawke. As a result, the Titanic was rushed into production, without adequate testing through sea trials, and poor crew preparation for the maiden voyage. On leaving the port the Titanic had a near collision with the steamer New York to the consternation of passengers and crew. This highlights the challenges the crew had in navigating a very large ocean liner for its time. Operationally the crew was not ready. The Lookout’s binoculars, vital operational tools, were missing. The very experienced Officer Lightholler later testified that it took him three days to get acquainted with the ship’s massive layout.

    Best Practices section further reviews the comprehensive change management methodology and focuses on change controlling. This covers the pitfalls in limited and inadequate testing and outlines the importance of setting up a battery of tests. The change control phase consists of test plan creation, testing, business reviews and assessments. It also discusses why the “Berlin Wall” approach to change management is not feasible in today’s business climate. The reader can use electronic templates  included with the book to create a Change Management structure.

  6. “All hands on deck” - The chapter takes a look at ensuring that the service delivery environment continuously delivers the newly developed business services. This is done in accordance with written Service Level Agreements agreed with customer representatives. The chapter looks at the organizational aspects required to create a support infrastructure and maintain a smooth running operation. The activities include maintaining the stability of the service delivery environment successfully, preventing disruptions from faults occurring, or minimizing these through a quick recovery method. This is based on a rapid and accurate problem management process oriented around a “clock”. Business executives and managers need to know the impact of the implementation on business services and the risk of remaining live with it.

    The Titanic case study reviews one of most poignant segments of the story examining the operational aspects of the ship through a detailed “flight clock” of events and decisions taken, leading to the disaster. A multitude of blunders culminated into an inevitable outcome. The Titanic had a number of built in feedback mechanisms that were discounted, fudged, or just ignored. For example, the officers kept their own binoculars and did not share them with the lookouts; Radio Operators overloaded in commercial traffic (noise) did not pass ice warnings (signal) along in a timely fashion; ice warning information eventually communicated through the hierarchy to Captain Smith wasn’t adequately acted on; Captain Smith succumbed to pressure to sail at full speed through the danger area; Officer Murdoch’s, responsible for navigating and steering, made a very questionable course of decisions in reversing the engines and steering hard.

    Best Practices section highlights the importance of organizing support around a rapid and accurate approach to problem management, and including the operational requirements of a new service into the life cycle of development projects. The project should not end as soon as the service is operational but until a proven level of stability is attained. The chapter also examines requirements for understanding the complex structure of a service delivery environment. It looks at some approaches for organizing and managing it in both a proactive and reactive way to maximize availability. This includes strategies for Early Warning Systems and Automation. There are many automated tools available but without a carefully laid operation’s foundation most tools are ineffective and even dangerous. The section also applies some current thinking to the Titanic case study and the ship’s operation by applying the 4-step “mean time to recovery” model.

  7. “We will remain afloat till help comes” - This chapter is a continuation of Chapter 6, Operations Management. The focus is on the recovery of service(s) to an alternate service delivery environment, that is the resumption of the original business service to the end-user from an alternate location. If normal problem recovery is not possible, contingency plans are invoked and a disaster is declared. The chapter introduces the “disaster cycle” which outlines how the interaction of humans typically follows a certain pattern in disasters. Business executives and managers need to know what the current business continuity plan is, how the plan will address the incoming the implementation, and what the risks are in the plan.

    The Titanic case study continues with the detailed “flight clock” examination of the recovery stage. It reviews the flow of information and how the Titanic’s hierarchical organization (3 classes), inhibited the flow through the structure, and the impact of this. Much precious time was lost in the first hour after the collision, as the disaster was assessed. Poor communication impeded time for passengers and crew to react. Many passengers got up and then went back to bed with the perception that they were safe. As a result, the first lifeboat left only half full because of the reluctance of passengers to get in. Effectively, the “impact and stocktaking phase” was untypically long as senior members of the crew operated in a state of disbelief. In addition, the launch of 16 lifeboats took over 2 hours because the crew was not adequately trained. Even if more lifeboats had been in place it is likely that there would not have been enough time to launch all of these.

    Best Practices section takes a “Why-What-How” approach and examines why disaster recovery is critical, what disaster recovery entails, and how disaster recovery is completed. Within this structure Best Practices looks at business continuity planning and issues such as application selection, recovery windows, and cost justification. It also reviews alternatives from hot to cold to on-line sites, and some of the techniques available through extended mirroring and remote replication.

  8. “Titanic effect” - The chapter reviews the highlights of each chapter, concludes the case studies, summarizes significant discoveries made, and then draws the major Lessons Learned.

    The Titanic chapter case study starts with a review of the post-disaster consequences. It examines the subsequent inquires, the new legislature and regulations implemented, and the everlasting changes made to the shipping trade. Many historians argue that the Titanic was the end of the 19th century and humanity’s unshakable belief in the progress of technology. The chapter continues by looking at all the Lessons Learned chapter by chapter. It sums up with the “Titanic Effect”, the severity with which a system fails is directly proportional to the intensity of the designer’s belief it can not.

    Best Practices are reexamined from previous chapters and reviewed in the context of mission critical environments. The chapter concludes by indicating how some organizations have been able to master availability and this is the starting point for creating better mission critical environments.

This page last updated on June 23, 2007.

| Site Map |
Copyright ©2001-2007 Mark Kozak-Holland
All Rights Reserved