Thursday, March 29, 2007

March 23 – Testing Done Right

My team recently had a small part to play in a huge implementation. There were something like 23 interfaces with other departments and outside organizations, thousands of modules and the actual implementation took nearly 48 hours. To give you an idea of the size, our small part included the removal of nearly 200 jobs and over a million lines of code. Even with all the activity, all systems were up and running on Monday without a problem.

Several team members came from a competitor that had recently done a similar system revamp. They brought with them horror stories of how things went horribly wrong. Upon implementation, they spent 3 months nursing the system back to health resulting in cash flow problems with major impacts on the business.

So why were we successful? The simple reason is testing. All projects go through testing phases, so what was special about the testing for this project?

Quality first. Early on the organization recognized the fact that quality counts. If you don’t put quality first you end up paying for it somewhere, especially on a major endeavor like this one. Verification had to be done or it wouldn’t go forward. This was the first project I have been on where the testing time was not chopped in order to cram the system in on time.

Truly Coordinated Testing. A separate project manager was put in charge of testing. Her job was to orchestrate several all encompassing System Tests with all involved parties. Tracking the database extracts and necessary datasets from capture through processing was a major undertaking.

Near Shore / Off Shore model. With the team spread from Halifax, Nova Scotia and Noida, India testing could be performed around the clock. Data was processed through the systems and made available to the online application for business review and approval.

Automated Test System. An in house scheduling tool referred to as a “Test Harness” allowed the jobs to be laid out ahead of time with dependencies and input files. This allowed execution without the usual human intervention of submitting jobs.

Testing in the DR Environment. More and more companies have solid Disaster Recovery (DR) systems on which they can revive their systems in the event of a major problem. Usually these systems are only used for recovery tests and live emergencies. The operations group reconfigured the system for use by the team to perform 2 full dress rehearsals of the event. This allowed us to bring up the full current environment and perform the conversion just as if it were production.

Not stopping at implementation. The implementation required the system to be down through all of the normal weekend processing. A limited Saturday “catch up” cycle was created and run on Sunday before a modified Sunday processing began. As part of the DR environment tests these two batch processes were executed. The results of those tests allowed adjustments to be made for smoother execution on go live weekend.

No fear of saying “No Go!” Technically we signed up for 1 dress rehearsal. When the first one didn’t go as smoothly as was hoped, the business decided to push the implementation date back a month and have another run at it. Other companies may have degraded the IT department and forced them to implement on the original date.

Bottom line? The success of this project shows that testing works if you actually get to do it right.

1 comment:

Josh Nankivel said...

This is some great lessons learned documentation from your experience with this implementation. Great synopsis and valuable content.

Thanks Thomas!

Josh Nankivel, The PM Student