Floating Fag End Bins Spell Disaster

5 08 2007

Been busy for a while, lots to catch up on. Cast your mind back a few weeks to those balmy weeks in June when the Monsoon season first paid a visit to the UK. Needless to say, it got more than a tad moist around the data centre. After a few days of torrential downpour, the woodland at the back of the building had enough of the deluge and decided not to hold on to the rain water any more. We happen to be located in a bit of a dip. We noticed with initial amusement the water levels rising around the back of our building. As it steadily rose over a few minutes great mirth was had as the fag end bins from the smoking shelter decided now was their chance to make a break for freedom, and floated nonchalantly past the window.

Then an alarm rang. The bung in a conduit into one of our computer rooms had given out and water was gushing into it under the false floor. Proved that the water sensors worked! So we decided to start cleanly shutting down affected applications over as quickly as possible to fail over to the remote DR site. This was soon followed by numerous identical queries: ‘has your session hung?’

Smoke had been smelt in the affected computer room, so the EPO had been hit. I was soooo jealous. I’ve always wanted to hit the EPO! Always imagined a scene like the one in ‘Total Recall’ where air is gushing out of the terminal on Mars after Arnie’s fake head exploded, and the brave soldier hanging on for dear life hovers his hand dramatically over the big red button for precarious seconds until he slams it and the emergency air lock doors come down. Instead, it was “<sniff>… I smell burning” followed by a quick poke of a small red button. Real life is so much more boring than the movies!

To cut a long story short, nobody had much sleep for the next 3 days. Some of the techies pulled 24+ hour shifts to restore service. We were plied with pizza – catnip to a techie – and heroism ensued. Got my first real life experience of TSM backupsets as well. Never felt the urge to use them, but someone thought it would be a bright idea to take some remote site backupsets to speed up recovery at our striken site, and I was asleep when it was agreed to use them.

So, if I’ve learnt one thing from this, it’s to make sure that if anybody ever suggests using TSM backupsets for a speedier recovery again, I shall staple their mouth shut. They are a righteous pain in the arse, especially when you’re sleep deprived!

Since then I’ve also had more fun with Brocade’s answer to the MG Montego, the 12000 director. More to come on that soon……




