Getting to grips with power

With the potential for power shortages, along with price hikes
in raw materials such as oil and environmental issues moving up the
agenda, large organisations are well advised to review their
ongoing power needs. It is therefore disturbing to discover that
most data centres do not have a good handle on the amount of power
they are using—even to what effect adding new IT equipment
will have on their power loadings.

Less than two years ago, power problems caused top international
news agency Reuters to go off the air for about half a day. Failure
of IT systems in such an environment is an absolute no-no; it cost
the news agency in lost revenue immediately but, at least as
important, it damaged customer confidence such that its reputation
for service reliability may not even now be fully restored. Reuters
acted swiftly to address the problem but it serves as a sobering
reminder of what could happen to any organisation not totally on
top of its ongoing power needs.

Worryingly, 40% of respondents to a recent private survey across
a large number of data centres admitted to having been caught cold
by a power problem they experienced. Nor did this mean the other
60% were in control, only that such a problem hadn’t happened to
them—yet. The cause was invariably the data centre managers’
insufficient knowledge of their maximum power capacities and
loadings.

Oddly, this problem seems to have exacerbated by the major IT
vendors’ management software that has focused more and more on the
‘virtual’ infrastructure several levels above the
physical devices and power connections. Tools to visualise the
physical layer, especially in relation to actual power usage at any
given outlet, are thin on the ground. This may well start to change
as IBM, HP, BMC, CA and their ilk realise there is value in
providing this information. Configuration items can be held in the
standard configuration management database (CMDB) as proposed by
the IT Infrastructure Library (ITIL).

Lack of focus on power usage may also be attributed to there
typically being no profit and loss (P&L) explicitly covering
this. If there were, realisation of the costs being incurred would
surely bring about rapid change.

Aperture Technologies is
one company specialising in visualising the physical layer in real
time—including the server racks, cabling and power supply
loadings. Aperture’s CEO Bill Clifford, speaking to me last week
following the release of Aperture’s latest VISTA500 software, said:
“Nobody else is really at level zero, because it’s dirty,
ugly, blue-collar. Everyone else is two layers up. Two layers up is
no good if the data centre is on fire.”

Fire is an extreme case of course; but such an event could be
triggered by an electrical overload which should have been avoided.
It is also really the data centre staff who get their hands dirty
as they record and load in this information—then keep it
up-to-date (except where the software can detect a new device and
its type)—so that the status can be constantly monitored.
Once captured, it can be used to carry out ‘what-if’
calculations on screen to show the effect of adding new equipment
on the power loadings (as well as, for instance, floor space and
loadings).

Often much of this information is available from elsewhere, for
instance from an asset management system. But Clifford said he had
yet to find one client whose data centre map was over 85% accurate.
He was also critical of CAD/CAM solutions that “paint pretty
pictures” but provided nothing to prove the information
displayed was accurate as well as “…out of date the instant
you draw the diagram.”

Data centre managers really need to know the location of every
server, every power unit, how many units, as well as sizes and
weights for layout planning. Empty space on a data centre floor
does not mean spare capacity; can it cope with extra power
including that needed for cooling even if the additional floor
loading is no problem?

Now factor in the potential for external power supply problems
in the future, particularly at peak demand periods—and
increased prices for energy use—and common sense should tell
every organisation to get a proper handle on what they have: what
it is costing on a daily basis, the effect of a price increase and,
most importantly, the areas of exposure to power shortages or
failures with their likely knock-on effects.

I do not claim to understand the maths behind conversion from
single-phase to three-phase for power transmission or the optimum
points for conversion back to single-phase. Nor do I need to.
Advanced power management (APM) systems can control that and feed
back actual loadings on all phases to expose the likely areas of
vulnerability and the biggest drains on power.

These things are surely critical factors in risk management and
meeting service and operational level agreements. ITIL, which
guides adopters towards best practice in infrastructure and its
management, advocates a work process for managing change without
separating this from the existing configuration. This means
properly addressing ‘what-ifs’ using accurate
information, in order to avoid a Reuters-style breakdown.

The Uptime
Institute exists to help companies improve systems’
uptime. It defines four classification tiers of hardening for a
data centre, each increasing in percentage uptime to beyond the
familiar ‘five 9s’ (99.999%) uptime level. Yet, while
hardening squeezes out system failure possibilities, human error
persists at every tier. Clifford points out that human causes of
downtime cannot be tackled unless the information on which a data
centre is managed is itself accurate. Wrong or missing information
leads inevitably to wrong decisions.

The physical format of data centre equipment also hugely affects
the picture. There are, for instance, considerable differences
between a conventional free-standing server with its own power
supply and high-density blade servers that draw their power from a
single source. Even if the total power needed is similar,
installing blade servers as a replacement might make the data
centre’s power requirement more ‘lumpy’—some
parts overloaded and others under-utilised. Moreover, shrinkage in
equipment size can be very deceptive, especially as power for
cooling high-density racks is a major factor. Space on the data
centre floor does not equate to spare capacity; the extra power
needed for new equipment may be the constraint.

As well as reducing risks, control of power can have huge
overall cost benefits. For instance, a large multi-national
enterprise could well have a number of scattered data centres.
Spare capacity in one might, for instance, be used to prevent
overloading in another, thereby obviating the need to buy more
capacity. This, though, is only possible if there is this level of
knowledge of every data centre. In fact, the whole organisation
should be getting to grips with its power usage.

As for the growing focus on the environment, nobody knows for
sure what will happen. Very recently there was a major power outage
after an unexpectedly high demand peak, primarily in parts of
Germany, France and Belgium. This follows earlier problems
experienced in Italy and also the New York area. If grid capacity
cannot always meet peak demand then high prices may be used as one
weapon to deter peak usage. So knowing how and where your power is
being used minute-to-minute will surely become critical.

As a final thought, if power bills prove to be much higher than
previous realised, investment in alternative power sources that can
provide some protection against power supply outages may suddenly
start looking attractive.