Bandwidth – so near I can smell it

2 08 2008

It looks like my rantings managed to help somebody, which has kicked me in the pants to apologise to my blog for neglicting it. As per my earlier posts, long distance bandwidth has always been a problem. Well, hopefully that could now be cured. I have received the gift of Ciena CN2000s. They claim I should get at least a 4:1 compression of my fibre channel traffic overall. If that’s the case, when all is done I’ll have a creamy 32Gb of bandwidth, which brings me a small amount of pleasure.

After a bit of faffing with our DWDM supplier about OTR wavelengths, I now have a pair talking to each other. A couple of fibre links are connected but not enabled yet, ready for that fateful moment when I enable the ports on the switches and hopefully something will happen. For once, I’m not irritated. Well, except for the amount of time I’ve had to spend begging and offerring deviant favours to get the bandwidth.

More to come……





It’s the software, stupid!

21 06 2007

During the past couple of weeks a couple of applications have fubar’d (I think that’s the correct spelling). And of course, we storage peeps are the scapegoats. Well, in each case, it’s definitely been a case of duff code. For starters, a Windows app (recently bought by HP) that shall remain nameless, but I will say its clusters failover by fiddling with SAN ports, toasted its log volumes. Completely buggered. Muggins was on call, so after over an hour of trawling switch and array logs, told them it was the app’s fault. Anyhoo, they recovered. Now the product support are trying to blame the SAN. I’ve put on my teflon coat and told them to write their application properly to handle problems. I’m a properly trained programmer, on a proper platform (that’s a mainframe to you script kiddies out there), and as part of your training you’re taught to make your code robust. Robust is never an adjective I’d associate with a Windows app. Windows has its place – on the desktop! Keep it out of the machine room and away from enterprise storage!

Then this week a couple of VMware servers managed to block several ports on one of our USPs by sending duff blocking instructions to it, which also caused great distress to several unix servers as a result. I don’t know the exact ins and outs, but once again badly written drivers cause the shit to hit the fan.

Talking of shit, I’ve just read this hilarious blog entry about the side effects of a new diet pill on sale in the US, which I caught by chance just now. It’s at http://angryaussie.wordpress.com/2007/06/20/miracle-diet-pill-with-teeny-tiny-side-effect/ and is a stark, oily brown warning to those of you, like me, who are of a round disposition, and who, unlike me, wish to change that shape to something more stick like by thinking you can pop a few pills. You can certainly pop a few pills, but in this case you’ll also pop, or to be more accurate, explode out of your arse!

And talking of stupid, I saw a great ad for a premium rate “fleece the gullible, stupid kiddies” text line. It’s very simple. Lonely? Want to know the initial of your soulmate? Well, text us a message, costing £1.50 (that’s $3 US), and we’ll text back a letter of the alphabet! I have to admire the audacity and genius of the person who came up with that. It’s a well known fact that poverty can be a great wealth generator for those who are willing to exploit those in need, but this is living proof that stupidity can be a great wealth generator too for those willing to exploit the clinically stupid!

But what came after that ad was even more priceless. The same company asks you to text them if you’re thinking about reproducing, and for the same price will send you a random name back for the offspring! If you’re even vaguely thinking of doing that, then stop right now, and don’t even think about reproducing. You’ve probably settled down to text for a baby’s name after getting the initial of your future soulmate, so for the sake of humanity, please, please, do not reproduce. All that will happen is that the gene pool will get even shallower, possibly evaporating to a small puddle.

I was in Brighton at the weekend for a union conference. Pretty mundane, but I saw a great t-shirt slogan, which I’m thinking as adopting as one of my mottos. I’ll leave you with this its poignant message:

“Get in Shape. Round is a shape”





Curing the long distance blues

23 05 2007

Now is the time to rejoice! I enabled ‘ISL R_RDY’ mode on the four ISLs between sites, enabled long distance with sufficient buffer credits for 30km at full packet size (my links are either 16km or 18km, depending on who I’m talking to) and sat back to watch the portperfshow. I thought I was going to wet my pants when I saw the speed on each link leap from 60mb/s to 150! I reset the port stats, and noticed that I was still getting a number of ‘out of buffer credits’ errors. I’m not sure if I can get stats on the average packet size going through (can anyone advise?), so I took a stab and increased to 40km and reset the stats again. My self pleasure went into overload as the speed topped 170mb/s!

To get a tape copy to run from the remote site to production, we deleted a couple of copy pool volumes in TSM and re-ran a copy storage pool with multiple drives, so I had tape traffic going in both directions, and soon the throughput was topping 200mb/s!

For once, I am happy, and needed a good hosing down to calm myself. I just need to get some long distance licences for the 4100s I have spare so I can move the other site links (we have a second remote site linked to another machine room) from the creaking 12000s and I will be truly joyful.

On the home front, so far there have been no further steaming gifts left in the living room, just the odd puddle!





R_RDY, Potty, Go!

18 05 2007

Had an interesting meeting today (don’t say that often) with a man from Nortel who summed up my long distance ISL problem nicely – I need to use ISL R_RDY mode with the long distance settings on the ISLs across our DWDM links (fronted by Nortel MOTRs). Apparently the Brocade VC Link Init protocol is too sensitive to work with their kit, so I shall be trying that out early next week. It may even cure my lack of throughput, which I suspect is because I’m not getting a huge amount of full size packets going through the links, hence exhausting the buffer credits.

Also found out that with Condor ASIC equipped Brocade switches (4100, 48000 etc) I should not expect to see an even spread of throughput across ISL trunks, as their algorithm loads a single link up to around 70% first before starting to spread out further, to cut out unneccessary frame splits and re-joins. Makes sense I suppose. 

There was a hint of a beige tint to the air when I got home tonight. My daughter has started potty training, and not long before I came home had decided that she couldn’t get to the potty in time, so left what the Big Yin (that’s Billy Connolly, one of the funniest people on the planet) may describe as a ‘wee beige jobby’ in the middle of the living room! At least she went and washed her hands afterwards!

Which leads nicely to the fact the I’m on call this weekend and already had my first call within half an hour of getting home 😦





Is it wrong for a SAN to wear make-up?

12 05 2007

Thanks for spotting my typo, I’ve had a good chuckle. About the only chuckle this week as this week has been my week on the Operations Bridge and be on-call, where I get the pleasure of sitting in uncomfortable chairs during the day, having to respond to inane flapping and bleeting, and then get woken up in the middle of the night for more flapping. Plus I’ve been in pay negotiations, and to thanks us for all our hard work my employer wants to offer us a pay cut in real terms. Nice. Well, if they want a fight, they’re going to get one. This is a good advertisment for union membership – in parts of our business where there is a high percentage of union members, pay offers are high, and where it’s not, they’re low. Hopefully I’ll be persuading more non-union members to join over the next few weeks, it’s time for their free ride to come to an end if they want to get a decent pay deal.

As you might be able to tell, I have a pretty low opinion of those who won’t join, but are happy to take the benefits that are gained from the subs paid by members, and complain that they haven’t had their pay rise yet. It’s like people who say “I didn’t vote – I couldn’t be bothered” or “it doesn’t make a difference”. Democracy is a priviledge, not a right, and if you don’t vote, you have no right to complain. If you don’t like the choice, spoil your ballot paper, express your displeasure. I have a lot of respect for those who’ll do that – it’s democracy in action. People are dying around the world every day in places like Burma, China and Zimbabwe to have what we have.

Now I’ve finished venting, on the SAN side, we’ve worked around the ISL hit for now, though we’ve rigged up another server to try various load tests. I’ve tried the suggestion about changing trunk masters etc., but I still think there’s a problem. The loads across these ISLs are just not balancing, i.e. if I’ve got 100mb throughput, I’d expect to see close to 25mb on each of the four trunks, yet it seems to load mostly onto one connection. Strange. Next week shall see me pulling cables (orange ones 😉 ), adding more, changing trunks and generally hiding away in the server room.

On the plus side, I’ve got Brocade Fabric Manager to play with for the moment, which seems useful, though like any Java app it sucks resource like an Electrolux (for those of you not British or of my generation, there was an ad in the 80s for a hoover that said ‘Nothing sucks like an Electrolux’ – naturally in my youth we would use that as a challenge for the ladies my friends and I would date). Anyhoo, I digress, so returning from the dirt track to the highway next week we’ll also be trying out some tuning of AIX server fibre cards, as we seem to be maxing out the throughput of our new TSM servers prematurely when duplicating LTO3 tapes, and we believe got the TSM settings optimised.

So, thanks for the input. I’m off to bed soon (after the end of the Eurovision Song Contest) as my wife’s away this weekend so I’m looking after the children (5 and 2) on my own, and they’ll be wanting their breakfast bright and early. At least it puts my storage problems into perspective!





Of ISLs and Men

29 04 2007

My shower is working again. It started working again the next day. Maybe a dodgy connection in the switch? Who cares, showers all round again.

I have a cold. My wife had it for two days, but took a day off work sick to aid recouperation, which did the trick. Then I caught it. Two weeks later I still have it. I’m too busy to be sick, same old story, so I just carry on at work as normal. At least I can share my misery. When I’m miserable, I want to share it. When I’m happy, I’m selfish, it’s all mine!

Over the past few weeks lots of new AIX servers have been going in. Almost without fail, for each one I’ll hear the whinge from the Unix guys that one of the partitions won’t log onto the SAN. “Must be a SAN problem” they cry. So each time I’ve gone into the server room, have seen no lights, so checked the cables, flipped the ends, tried other ports and even thrown 30m cables across the room to check whether or not it’s the patch panels, and every time I’ve turned round and said “it’s your server – check the fibre card”. And each time I’ve been right. Shit like this just never happenned when I worked with mainframes. Looks to me like we’re getting more and more shoddy server fibre cards emerging from wherever they come from in the Far East. IBM need to get their act together.

However, a curious performance problem hit recently. Chronic performance from a new server/app on one of our shiny new USPs. We did the usual checks, dispersed allocation across RAID groups, port contention etc. Noticed that the filesystem was not striped. So that was sorted. But problem was still there, seeing 30ms responses. Head scratches all round. It was noted that the server was connected to one 48000, the storage on another, both linked by a under-utilised 8gb trunk. I beefed the trunk to 16gb, still bad. To cut a long story short, we proved the point the ISL was the bottleneck, server and storage on same switch = 3ms response. Now that has really baffled me. Our normal standards are to try and host servers and their storage as best we can on the same switch, though it’s not always possible, but I’ve never seen latency like this. The switches are close to each other, linked by 9m cables between ISLs, and I’d expect no more than 20 micro seconds each way latency, not milli-seconds! Been hitting the books again, but baffled once again. If any of you kind techies out who might be taking your valuable time to enjoy my rantings have any thoughts, I’d appreciate any insight, as I have appreciated the comments I’ve received to date.





Showers, SANs and Insurgency

9 04 2007

I’m doomed not to be able to have a decent hot electric shower. Came home from an Easter weekend away to find it’s the second time it has stopped working. I expect it’s the electrics again rather than the shower. Looks like I’ll have to buy another, less powerful, shower and try and sell this one on ebay. Anyone want to buy a 10kw shower? In the meantime, bath time each morning. Just one more thing to irritate me.

Spent the weekend with my wife and children with her brother and his family and extended family. A great weekend, only spoiled by the damn shower when we got home! On Sunday we went to a first birthday party, which was actually more of an adult affair. There I met a neighbour, an American gentleman called Michael and his wife and child. He told me he was a journalist, and had been a foreign correspondant now writing his second book. I finally got the chance to sit down for a chat when we had to leave for the long journey back home. So I promised to look up his book when I got home.

After a quick google, I found out some more about this interesting gentleman. His name is Michael Goldfarb, his book is called “Ahmad’s War, Ahmad’s Peace“. Michael is a very, very accomplished journalist. After reading bios, I am sad that I didn’t have more time to talk with him. He has created a documentary for the “Inside Out” program on WBUR Boston on the subject, which, in my opinion, is a moving account of the start of the current war in Iraq from a personal perspective. Click here to visit that documentary, which includes the radio program. It’s an hour long. Take my advice: get yourself a clear hour, turn up the speakers on your PC, get comfy and listen to it.

At work, I’ve finally sorted out my SAN problem, no thanks to the hopeless vendor (two letters, known for printers). Brocade were on site earlier last week for another matter, and their technical guru sat down with me, looked at the problem for a few minutes, and said “hafailover”. I’d suggested that to my vendor over a week previous – perhaps it got lost in translation between here and India. So, I spent an hour raising the change ticket, another hour talking to our customers, then the following morning I hit the return key, crossed my fingers and hoped the 12000 wouldn’t die. It was as monumentous as when the clocks ticked over to 1st January 2000. No planes dropped out of the sky, I still owed the bank several body parts for a mortgage, and the problem went away.

Take note Brocade customers – Brocade are offering a supplementary support service, obviously at a cost. If you’ve bought your switches from HP, IBM or EMC, get them in now. Don’t delay!