Been a while….

25 01 2008

It’s been a while since my last post. However, I have been getting lots of comments. Looking just now, it was around 35,000! Unfortunately, all spam. Another of life’s little irritations. I had to take down my union branch’s website forum recently because of spammers breaking through the defences. Went away for a two week holiday to Canada (lovely country, by the way) and found some of the site users most distressed at Viagra and various orifice probes being offerred as solutions to the questions they were posing.

I’ve been getting to grips more with our HDS kit recently. Tuning Manager is a bundle of laughs, isn’t it? As was pointed out to me, it was written in the Japanese market place, which is vendor driven rather than customer driven. Boy, does it show. And I enjoy some of their intuitive screens on Storage Navigator – who thought it was a bright idea when setting up Universal Replicator that right clicking on a large blank grey area of screen to get something done was intuitive?

On the domestic front, I’ve had to change the shower again. What is it with me and electric showers? They seem to last just long enough to be out of warranty and then *bang*. I’ve also been tracing my family tree, which has been really interesting, and at times my wife has had to suggest to me that I pause for a break, as she thinks I’ve been a tad obsessive with it at times. I think she may be right 🙂

I’ve got quite a lot of ranting to get out of my irritated system. And the odd question to pose. I’ll need to filter it out in small chunks.

Oh, and by the way, if anybody is looking for an experienced Storage professional to cover the Midlands area for a Professional Services Consultancy type role, let me know.


It’s the software, stupid!

21 06 2007

During the past couple of weeks a couple of applications have fubar’d (I think that’s the correct spelling). And of course, we storage peeps are the scapegoats. Well, in each case, it’s definitely been a case of duff code. For starters, a Windows app (recently bought by HP) that shall remain nameless, but I will say its clusters failover by fiddling with SAN ports, toasted its log volumes. Completely buggered. Muggins was on call, so after over an hour of trawling switch and array logs, told them it was the app’s fault. Anyhoo, they recovered. Now the product support are trying to blame the SAN. I’ve put on my teflon coat and told them to write their application properly to handle problems. I’m a properly trained programmer, on a proper platform (that’s a mainframe to you script kiddies out there), and as part of your training you’re taught to make your code robust. Robust is never an adjective I’d associate with a Windows app. Windows has its place – on the desktop! Keep it out of the machine room and away from enterprise storage!

Then this week a couple of VMware servers managed to block several ports on one of our USPs by sending duff blocking instructions to it, which also caused great distress to several unix servers as a result. I don’t know the exact ins and outs, but once again badly written drivers cause the shit to hit the fan.

Talking of shit, I’ve just read this hilarious blog entry about the side effects of a new diet pill on sale in the US, which I caught by chance just now. It’s at and is a stark, oily brown warning to those of you, like me, who are of a round disposition, and who, unlike me, wish to change that shape to something more stick like by thinking you can pop a few pills. You can certainly pop a few pills, but in this case you’ll also pop, or to be more accurate, explode out of your arse!

And talking of stupid, I saw a great ad for a premium rate “fleece the gullible, stupid kiddies” text line. It’s very simple. Lonely? Want to know the initial of your soulmate? Well, text us a message, costing £1.50 (that’s $3 US), and we’ll text back a letter of the alphabet! I have to admire the audacity and genius of the person who came up with that. It’s a well known fact that poverty can be a great wealth generator for those who are willing to exploit those in need, but this is living proof that stupidity can be a great wealth generator too for those willing to exploit the clinically stupid!

But what came after that ad was even more priceless. The same company asks you to text them if you’re thinking about reproducing, and for the same price will send you a random name back for the offspring! If you’re even vaguely thinking of doing that, then stop right now, and don’t even think about reproducing. You’ve probably settled down to text for a baby’s name after getting the initial of your future soulmate, so for the sake of humanity, please, please, do not reproduce. All that will happen is that the gene pool will get even shallower, possibly evaporating to a small puddle.

I was in Brighton at the weekend for a union conference. Pretty mundane, but I saw a great t-shirt slogan, which I’m thinking as adopting as one of my mottos. I’ll leave you with this its poignant message:

“Get in Shape. Round is a shape”

Of ISLs and Men

29 04 2007

My shower is working again. It started working again the next day. Maybe a dodgy connection in the switch? Who cares, showers all round again.

I have a cold. My wife had it for two days, but took a day off work sick to aid recouperation, which did the trick. Then I caught it. Two weeks later I still have it. I’m too busy to be sick, same old story, so I just carry on at work as normal. At least I can share my misery. When I’m miserable, I want to share it. When I’m happy, I’m selfish, it’s all mine!

Over the past few weeks lots of new AIX servers have been going in. Almost without fail, for each one I’ll hear the whinge from the Unix guys that one of the partitions won’t log onto the SAN. “Must be a SAN problem” they cry. So each time I’ve gone into the server room, have seen no lights, so checked the cables, flipped the ends, tried other ports and even thrown 30m cables across the room to check whether or not it’s the patch panels, and every time I’ve turned round and said “it’s your server – check the fibre card”. And each time I’ve been right. Shit like this just never happenned when I worked with mainframes. Looks to me like we’re getting more and more shoddy server fibre cards emerging from wherever they come from in the Far East. IBM need to get their act together.

However, a curious performance problem hit recently. Chronic performance from a new server/app on one of our shiny new USPs. We did the usual checks, dispersed allocation across RAID groups, port contention etc. Noticed that the filesystem was not striped. So that was sorted. But problem was still there, seeing 30ms responses. Head scratches all round. It was noted that the server was connected to one 48000, the storage on another, both linked by a under-utilised 8gb trunk. I beefed the trunk to 16gb, still bad. To cut a long story short, we proved the point the ISL was the bottleneck, server and storage on same switch = 3ms response. Now that has really baffled me. Our normal standards are to try and host servers and their storage as best we can on the same switch, though it’s not always possible, but I’ve never seen latency like this. The switches are close to each other, linked by 9m cables between ISLs, and I’d expect no more than 20 micro seconds each way latency, not milli-seconds! Been hitting the books again, but baffled once again. If any of you kind techies out who might be taking your valuable time to enjoy my rantings have any thoughts, I’d appreciate any insight, as I have appreciated the comments I’ve received to date.

Wood, Trees, D’oh!

26 03 2007

Well, good news and bad. Good news, the DWDM links are working properly. Bad news, my inability to see the wood for the trees. Got so tied up looking at the quality of the links, someone said to me today “You’ve allocated too many buffer credits”. So, I turned off the long distance, hey presto, no more time outs. D’oh, back to the dummy’s book for me. Too many credits meant the destination port was being flooded. Now I need to test and tweak. Without LD, I’ve not got enough credits. We’ll have a diverse 22km route soon, maths says “1 credit per km” for a 2gb link. I though dynamic allocation (LD) mode on the Brocades would handle everything nicely for me, but obviously not. Need to find out why it thought the link was 30km. So, time for testing with static allocations. Nothing like sucking and seeing.

Bad news, still waiting for my 12000 vendor to come back to me and explain why one half still thinks the fabric is busy and stops me propagating fabric changes. Not impressed. I fear I’ll have to reboot, which means an emergency change. Perfect timing as our Change Management team has just introduced a new template, which means half a days just to fill out the change request for the outage. And of course, they’ve not publicised the change. I noted in the new template there’s not a section relating to hypocrisy.

No ends in sight

24 03 2007

Firstly, thanks to Sangod for his comments. The morning after I wrote the piece I was showing my new spangly blog to a team mate, saying how I found other storage blogs useful, went to Sangod’s site and there was a reference to my post! It made me feel very special (not “Special Needs” as my co-workers commented!) However my joy was brief.

My long distance relationship is now over. It’s now shorter, thanks to having having both DWDM routes at 15km, which may occasionally diverge if we’re lucky. It’s an accident just waiting to happen. At least all my links are up, even if I did have to travel to the remote site to swap a tx/rx connection around. But those shiny green lights on the ports on the Brocade 48000s are as deceptive as Gordon Brown’s latest budget (don’t get me started on how he’s cut the basic rate of tax yet the lowest paid in our society will be worse off). The links are up, I have 2 x 2gb dynamically trunked paths on each fabric, the switches see each other, but can anyone see their remote storage? Our remote USP array can see the WWN of the remote servers, but any attempt to access that storage times out.

I see the ‘encoding errors outside of frame’ count go up faster than my blood pressure, which indicates to me that the connections are about as crap as the customer service from my ISP. We have DWDM links to another remote site provided by another telecoms company that are mostly reliable and after aeons of up time those ‘enc_out’ errors number are around 3,000, far less than the 300,000+ in a couple of hours seen on the new links. If anyone reads this and has any thoughts on this subject, I’d appreciate the input.

But that’s not the end of it. In another location I have one half of a Brocade 12000 director preventing us from propagating any configuration changes. Even an ‘alishow’ request is ignored with a ‘fabric busy’ message. The other half of the 12000 is fine. This leads me nicely into another area of irritation. Firstly, it irks me that I have to get support from the re-seller rather than direct from Brocade (though I believe if we pay them enough now they’ll help out), especially as our fabrics have built up over time so switches have come from different re-sellers. But then they re-badge them. To me, it’s a Brocade 12000. So to the re-seller, it should be a Brocade 12000, not an HP whatever or IBM thingumyjig (at least HDS supply them vanilla). Anyhoo, I’ve had a call logged for about 3 days now, still no solution, last I heard the problem is relaxing somewhere on the Indian sub-continent. At least if I end up power cycling the director then only fabric is affected and data can be reached on the other, unlike another site that had each half of the 12000 on a different fabric. It certainly hit the fan when that director crashed recently. Roll on hardware refresh time, I like the Brocade switches, but the 12000 is certainly not their best design.

On a personal note, got a letter today saying I could be a match for someone needing a bone marrow transplant, so some more test are needed. If it comes to it, donating marrow or stem cells is a tad more invasive than your average blood donation. I knew this when I signed up, but seeing this now in writing puts it into perspective. A little scary.

Long Distance Relationships are a pain

20 03 2007

We’re currently commissioning a new data centre. It’s about 10km as the crow flies, but can you get decent DWDM links over that relatively short distance? A certain cable company in the news at the moment (I left them for Sky a couple of years ago) is providing the links, but one of them is 60km! At least the data will be well travelled. We’re trying to implement syncronous truecopy between our USP arrays, and you can’t do it with that. Now it looks like we’re going to end up with all our routes instead going over the shorter 15km link, so all it will take is a clumsy oaf with a digger and our SAN links will go down faster than Audley Harrison in his last fight.

At the moment we have two links up, one on each of our fabrics, and one of them is giving so many ‘out of frame’ errors that we cannot propogate fabric changes on it, and I’ve spent most of today trying in vain to get around it. It’s having to deal with crap like this that is out of our hands that makes everything else I’m working on late, and that really pisses me off.