Data Centres a major cause of concern

Most of us in IT are very focussed on the delivery of functionality through software and we take for granted the platform it has to run on. I have recently become aware of just how involved it can be to ensure the availability of basic infrastructure.

Major concerns of data centre managers today include thermal management and achieving server density. It appears that even data centres built in only the last few years are struggling to cope with the demands that modern blade-based clusters can impose of heat management. And it is not just getting the heat out that is a major concern—there is also the issue of getting the power in. Today’s systems are not only dense, but that density is also very power hungry and that demand can exceed the capability of the local grid to supply the power required.

One company that I have had dealings with has had to ban their employees from using Christmas decorations because of fear that the additional power demands could be the final straw that broke that particular camels back. Another company has had to take away the ovens that used to produce baked potatoes for the staff at meal times because they too jeopardised the power supply that the servers rely on.

And it is not just power that is a major cause for concern, I have come across those who have been busy buying blade servers and moving all of their voice communications to run over the internet who now have major problems at getting the heat out of the server farms and are being told to look at water cooling.

One contact has informed me that one of the problems they faced with blades is that although they share the same form factor they do not share common design for ducting air in and out and as a consequence they had a major problem of unreliability because they had racks in which one blade was pumping hot air into another’s inputs and as a consequence overheating their neighbours. As you are probably aware with all electronic equipment once it gets too hot it becomes unreliable; in particular the power supplies and the hard drives and the fans will start to fail. All failures will increase the cost and impact on systems availability.
My former colleagues at HP are busy working with Oracle to find ways to help people finance a move to grid computers. The problem they face is that as we moved away from mainframes to server farms we have got accustomed to making each project bear their own infrastructure costs, with no concept of sharing resource. Grids like mainframes are a shared resource which needs to be paid for upfront and then the cost recouped by those who subsequently use it.

So it appears that those of us who are only concerned with delivering software functionality are operating in blissful ignorance of the challenges of those who we think just have to get some tin and wire for our applications to run on.