|
Audios |
|
Recordings Made in 2003 these readings are from an extended abstract. Total running time 25 minutes. The author reads from On-Line, On-Time, On-Budget and discusses various aspects of risk in Internet or e-business projects. Transcript Download the full transcript to the audio files. Imagine you are in one of Titanic’s
lifeboats just sighted by the rescue ship Carpathia. As you look back
at the wreckage site, you wonder how such a disaster could have happened.
What were the causes? How could things go so badly wrong? Why did she founder?
No one had expected it.
Titanic’s maiden voyage was a disaster waiting to happen as a result of the compromises made in the project. This book explores how business executives can take lessons from a nuts-and-bolts construction project like Titanic and use those lessons to ensure the right approach to developing on-line operations. Looking at this historical project as a model will prove to be incisive as it cuts away the layers of jargon and complexity. On-line On-time On-budget is about delivering IT projects in a world where on-time and on-budget is not enough. You need to be on-line —connecting to the Internet and dealing with the 24-by-7 expectations of your customers and partners. It will help you successfully maneuver through the ice floes of IT management in an industry with a notoriously high project failure rate. On-line, On-time, On-budget is intended for readers who want to know about putting operations on-line and the delivery of IT projects but do not want to be overawed by their perceived complexities or IT jargon. It is a serious attempt to explain in simple terms how to get involved and deliver an IT project successfully. It is designed to help CFOs or 'C' levels executives and directors understand some of the key issues in an IT project from beginning to end. With minimum “tech-talk” or IT jargon, the book captures the issues an executive needs to be aware of to help ensure their IT projects succeed long after deployment. So why would an IT project be of interest to a CFO? Many organizations are restructuring their hierarchies so that the IT organization reports directly into the finance organization. IT is seen as a cost and therefore the CFO is likely to have the VP of IT or CIO reporting into them. Many CFOs are becoming more responsible for approving new IT projects in organizations as well as ultimate responsibility for the delivery of business services to customers. So what are the issues? The classic 1998 report from Standish Group “Chaos, a recipe for success”, highlighted that 26% of all IT projects finish on-time, on-budget, with all the features and functions originally specified. However, this is only part of the overall picture as many problems only surface when the IT project is implemented, sometimes many months or even years into the operation. IT projects that deliver continuous on-line operations or services are of particular interest to the book because most IT projects don’t look beyond the implementation and the repercussions remain with an organization for many years. Even though the quality of individual technologies has improved by an astounding rate reliability of business services has not kept pace. Every month in the press we see examples of very well publicized service outages that cost organizations hundreds of millions of dollars. Why do IT projects have such a high project failure rate and why are business services so difficult to deliver? There are many reasons highlighted by the Standish Group report. One of the fundamental reasons is the impact of the Internet and this is the principal focus of the book. IT projects today are going “on-line” through Intranets, Extranets, Internets, Portals, or other electronic channels. As an organization moves its operations on-line it exposes the inner workings of the business operations to potentially millions of customers, partners, and suppliers around the world. This also changes the expectations and behavior of customers and partners as they acclimatize to a 24 by 7 on-line operation. Business managers that fail to provide adequate, highly responsive, stable business services capable of withstanding the onslaught of weekly changes required, will lose demanding customers ready to switch or “click” to a competitor. Outages have become horrendously expensive and highly visible because of the exposure of the Internet. Continuous business service availability is becoming a major competitive advantage and this raises the bar for IT projects that already have a notoriously high project failure rate. So why use Titanic as a case study? Organizations are discovering that IT projects are too critical for a business executive to sit back and not be involved in. So how do you present key concepts to an executive so that they can take an active role. One approach is through a historical case study and for this On-line, On-time, On-budget uses Titanic. The project life cycle from Titanic is not much different to those used in IT projects today. Purists in the IT industry may beg to differ. But fundamentally they are the same. There are many specific analogies from Titanic’s short history that map very well to the many challenges and decisions encountered in designing, constructing, and operating an on-line operation, and these are reflected through each chapter. Titanic is a worthy backdrop for today’s IT projects. Titanic’s construction took four years yet in the course of four days Titanic was destined to its grave through a number of bad decisions. Modern interpretations of Titanic’s story point to the inadequacies of technology like the brittle steel and the shortage of lifeboats. However, this is very misleading as practically all safety features were compromised before the ship was launched. During the maiden voyage rules and procedures were not adequately tested, incorrectly implemented, and blatantly and continuously violated by the very people who were ultimately responsible for the ship, its safety, the passengers and crew. This provides a very important lesson for today’s complex service delivery environments that are similar in many ways. So how can you ensure the success of an IT project? What role can a CFO or business executive take to ensure the success of an IT project? Through the course of any project hundreds of critical decisions are made that affect the overall implementation and operation. Stage by stage an executive needs to be aware of where the project risk is, what decisions are critical, which activities require tight control, and what business representation is required. The framework for the chapters, based on a project life cycle, highlights these and how mistakes are made in the evolution of such a project. It also highlights how a strong executive leadership role is vital, in playing the devils advocate, coaxing the team to be proactive and diligent in evaluating decisions, and the kind of questions that need to be asked at each stage. Stage 1: How does the IT project align to the business? In the first stage of the project the business executive should question how the IT project aligns to the business, articulates the business problem or opportunity, specifies the solution, and its overall value to the organization. This includes the business risks of the Internet and factoring this into the business case. This sets up a go/no-go decision whether to proceed with the project and the on-line operation. Similarly, Titanic’s case study for this stage examines the origins of the Titanic project, how it was conceived, the business drivers behind White Star’s decision to replace its aging fleet of liners, and the competitive business strategy to win passengers across the 3 classes using comfort over speed and levels of service provided. It outlines how the project was financially justified through a cost/benefit analysis that underpinned the expected levels of service. It also looks at the business rationale and main decision points. Stage 2: Which parts of the infrastructure need investments to make the business successful? In the second stage the business executive can determine whether the business view is etched into the functional requirements and design, and that the on-line operation is architected with the appropriate levels of availability to protect it according to non-functional requirements supported by the business case. This requires the project team to verify, or check-point critical areas and components in the architecture. This helps create the setting for important granular decisions in the next stage. The Titanic case study looks at the formation of the main functions and design of the ship. It examines how competitive business pressures led to a spotlight on these functional requirements, with a priority for luxury and splendor over everything else in the quest to design a palatial hotel. The lavish attention and investments paid to passenger comfort implied there was an equivalent investment in the non-functional requirements of safety and operations functions. Stage 3: What are the best possible safety features to incorporate to protect the business? In the third stage the business executive can ensure that in the construction of the on-line operation the critical areas and components, causing the greatest problems if unavailable, are protected adequately by selecting from a comprehensive list of availability techniques (software, hardware and process). This includes looking at high availability-advantages and disadvantages, the best circumstances for each technique, and of course the costs. This also requires reviewing some of the challenges around the complex integration of the on-line operation with the back end systems in the environment, completing functional unit testing, and preparing for non-functional testing in the next stage. The Titanic case study looks at the construction techniques and compares the selection issues of proven versus new technology to protect critical areas of the ship. Many leading edge safety features were incorporated and this created an over confidence that nothing could go wrong. In fact, there was an unshakable belief in the safety of the ship, a lifeboat in itself. As the construction neared completion esthetic factors were allowed to compromise the non-functional requirements. As a result, the safety features and the design of ship were fundamentally flawed. For example, the height of the bulkhead walls was too short, the double skin bottom was below the water line, and the ship carried the minimum number of lifeboats based on regulations. The study also highlights how maritime legislature was hopelessly outdated by the rapid evolution of shipping technology. Stage 4: What sort of tests need to be planned to ensure the business is protected? In the fourth stage the business executive can ensure that there is a plan for testing the on-line operation. This requires getting ready to test for the characteristics that are important, planning the level of dynamic testing required, selecting the right kind of tests, and preparing the test environment. The focus is on non-functional requirements first. The planning will be for either testing a new isolated solution, a new solution that is integrated to an existing solution, or a new solution that is replacing an existing solution. The business executive needs to understand the risks associated with the incoming change, the potential impact to the environment and existing business services. The “Berlin Wall” approach to change management, that forces all changes through one or two tightly policed checkpoints, is not feasible in today’s highly dynamic business climate. The Titanic case study looks at the testing or sea trials undertaken specifically the limited operational and safety testing. With the sister ship Olympic already established in service extensive sea trails and testing were not considered critical. However, Olympic was involved in several serious incidents including a major collision with HMS Hawke. Olympic had gone into dry dock for repairs and as a result the pressure was on to get Titanic into operation to make up for lost revenues. Titanic was rushed into production, with limited sea trials or testing, and the operational readiness of the crew for the maiden voyage was poor. Stage 5: How is the plan followed to ensure that everything is tested? In the fifth stage the business executive can ensure that the testing is done according to plan to determine the robustness of the on-line operation. This requires integrating it into a test environment and through extensive non-functional (and some functional) testing determining its overall integrity and availability, and its potential impact to the surrounding service delivery environment. Once all the tests are passed the stage prepares for “going live” which delivers a fully working and tested solution into the live service delivery environment. The testing stage is a critical part of the project life cycle as typically this is where any warning signs of a potential pending failure will start to become visible. At this point the business service metrics and measurements are set up, and the service level objectives and agreements are established and agreed to by all parties. The Titanic case study examines how on leaving the port Titanic had a near collision with the steamer New York to the consternation of passengers and crew. This highlights the challenges the crew had in navigating a very large ocean liner for its time. Only one lifeboat drill was performed the outcome of which underlined the poor operational readiness of the ship. Operationally the crew was not ready. The Lookout’s binoculars, vital operational tools, were missing. The very experienced Officer Lightholler later testified that it took him three days to get acquainted with the ship’s massive layout. Stage 6: Is the on-line operation for the business ready to run? In the sixth stage the business executive can ensure that the organization and processes have been set up to successfully run and deliver the on-line operation and service delivery environment. The project should not end as soon as the service is operational but until a proven level of stability is attained. Business executives need to know the impact of the implementation on business services and the risk of remaining live with it. They need to know how to create a support infrastructure and maintain a smooth and stable running operation, prevent disruptions from faults occurring, or minimizing these through a quick recovery method. This is based on a rapid and accurate problem management process oriented around a “speed of recovery clock”, in getting the operation back on-line as quickly as possible. This should also include strategies for early warning systems, automation, eventually leading to self-monitoring, self-healing, and self-balancing systems. These will not only monitor, manage, repair, and maintain the on-line operation but also improve the required levels of service and availability. The Titanic case study reviews one of most poignant segments of the story examining the operational aspects of the ship through a detailed “flight clock” of events and decisions taken, leading to the disaster. Titanic had a number of built in feedback mechanisms that were discounted, fudged, or just ignored. For example, the ice bucket test or Radio Operators overloaded in commercial traffic (noise) did not pass ice warnings (signal) along in a timely fashion. A multitude of blunders culminated into an inevitable outcome. For example, the officers kept their own binoculars and did not share them with the lookouts; ice warning information eventually communicated through the hierarchy to the captain wasn’t adequately acted on; the captain succumbed to the director’s pressure to sail at full speed through the danger area and the ship ground itself onto an ice shelf. The director was desperate to prove the ship was a technological improvement over Olympic. Both the captain and the director then made the horrendous decision to sail off the ice-shelf and head for Halifax. Stage 7: What sort of contingency is needed? In the seventh stage the business executive can ensure that disaster recovery is considered. This will allow recovery of on-line operations in times of disaster. This requires a “Why-What-How” approach that is why disaster recovery is critical, what disaster recovery entails, and how to determine whether you are in a disaster. The executive needs to know what the current business continuity plan is, how the plan will address the incoming implementation, and what the risks are in the plan. This requires looking at business continuity planning and issues such as application selection, recovery windows, and cost justification. It also reviews alternatives from hot to cold to on-line sites, and some of the techniques available through extended mirroring and remote replication. The Titanic case study continues with the detailed “flight clock” examination of the recovery stage. It reviews the flow of information and how the Titanic’s hierarchical organization of 3 classes, inhibited the flow through the structure, and the impact of this. Titanic’s officers clearly did not have a business continuity plan. Much precious time was lost in the first hour after the collision, as the disaster was assessed. Poor communication impeded time for passengers and crew to react. Many passengers got up and then went back to bed with the perception that they were safe. As a result, the first lifeboat left only half full because of the reluctance of passengers to get in. Effectively, the “impact and stocktaking phase” was untypically long as senior members of the crew operated in a state of disbelief. In addition, the launch of 16 lifeboats took over 2 hours because the crew was not adequately trained. Even if more lifeboats had been in place it is likely that there would not have been enough time to launch all of these. Stage 8: How is a post-mortem conducted? In the final stage the business executive can better plan for the on-line operation and go through a post-mortem of the on-line operation following a major unplanned outage. This provides a roadmap to focus the learning energies of an organization on problem prevention and how to improve the required levels of service and availability. It also provides the tools to evaluate the organization’s support ability and a road map to improve this. The Titanic case study concludes with the completion of a post-mortem of the events leading to the tragedy. It follows the steps required in collecting the qualitative and quantitative evidence, building the event time line, creating the problem statement, determining the contributing factors, and analyzing the root causes. The results are incisive and provide a provoking conclusion to the book. Titanic’s maiden voyage was a disaster waiting to happen. The operational readiness of the ship was so poor and the overall decision making at odds with basic rules of seamanship, made the disaster almost inevitable. Effectively, there was no one single event or factor that caused the disaster but a combination of many. Conclusion The most successful IT projects do not happen by accident. Executives can draw on all these points and stages to make their own conclusions before approving another IT project. They are now in a better position to question or even challenge IT projects, become a full participant in the creation process, and better assess operational readiness and understand the impact on business services. After all they are ultimately responsible for the on-line operation. |
This page last updated on June 11, 2006.
| Site Map |
Copyright ©2001-2006
Mark Kozak-Holland
All Rights Reserved