What next for NetApp and de-duplication?
In the process of
losing its bidding battle with EMC for de-duplication market leader Data
Domain, Network Appliance (NetApp) has exposed its weakness in the
de-duplication (de-dupe) market sector. It had previously developed its own
technology (A-SIS) but evidently accepted that Data Domain provided a better
bet.
Data Domain
preferred NetApp, who seemed to have the deal sewn up with an accepted bid of $1.5
billion; then EMC made a hostile bid of $1.8 billion and, when NetApp matched this,
EMC went higher. NetApp was undoubtedly prudent to walk away at that point.
Perhaps it did not do so badly (and I am not referring to it receiving a
$57 million break-up fee from Data Domain, so from EMC).
The de-dupe market
is far more complex than simply saying “Data Domain is the market leader so the
best.” There is actually quite a choice of solutions suited to different
situations.
Data Domain’s
approach is fast and efficient, as well as intuitive in that the appliance sits
in the data stream and de-dupes ‘in-line’ as the data is received with no delay
to the output. So it is free-standing and basically plug in and go to get
immediate results. Yet this hardly scratches the surface of the evolving market.
Quantum’s
appliances compete directly with Data Domain’s and have been offered by EMC (so are likely to be dropped over time). Meanwhile, Falconstor’s virtual tape library (VTL)
solution with de-duplication competes well against Data Domain’s VTL option.
ExaGrid and
Sepaton are two of the independent vendors providing post-process de-duplication—the compression carried out after the back-up as this is not so
performance-critical as in-line. ExaGrid’s approach looks like in-line to the
user at it backs up onto its own appliance ‘in-line’ and immediately de-dupes
it out to the destination.
Both also cluster their appliances to gain greater
scalability. (Data Domain is expected to add this capability in due course but
is not there today.) Sepaton’s approach
is also geared to specific back-up application types so can gain greater
compression for some formats by recognising and removing the applications’
headers and data markers from the data stream.
Ocarina (also
post-process) is unique in using content-aware compression using algorithms which
can de-dupe already compressed JPEG and MPEG file formats; none of the others
so far make any impression on these formats—a problem that is increasing with
graphics and video files becoming ever more common.
Then there is
CommVault’s Simpana which uses a more global approach, embedding de-dupe in all
back-ups, remote replication and archiving—and so far the only vendor providing
de-dupe even for archive tape. NetApp itself was the first to offer de-dupe for
primary data with very little performance overhead. However, I can understand
some nervousness about playing with the integrity of primary files as distinct
from backups.
From a legal and
security standpoint, there are a couple of basic de-dupe issues. One cannot
de-dupe encrypted data—but leaving it unencrypted in order to de-dupe it
obviously makes it more vulnerable to hack attacks. Then, fairly obviously,
de-dupe systems need to tamper with the stored data; yet some legal cases hinge
on the ‘real evidential weight’ of the stored information so the tampering could
in theory be used to swing a case. So, de-dupe needs careful consideration by
those organisations for whom security or legal concerns are critical.
Finally, to me, there
is anyway a joker in the pack that may be played going forward. Earlier this
month I wrote about companies specialising in IT infrastructure optimisation
including WAN optimisation—and it is no surprise that they use various advanced single instancing
(SI) and de-dupe techniques; some of these will reduce the size even of an
already de-duped back-up copy.
So, one argument
looking to the future goes: “Who needs de-duplication appliances at all when
WAN optimisation has even better technology built in?” This, of course, assumes
the cost-benefit of installing such optimisation software and equipment would
be greater with de-duplication then providing little or no extra benefit. (In
time that might become the case but it is not so just yet.)
NetApp clearly has
a few alternative companies it could go for—or partner with—assuming it
does not want to opt for further development of its own technology. Right now
these may seem like second choices but are actually just alternative ways of skinning
a cat so to speak—with some of them very sound.
NetApp will no
doubt think carefully about its strategy so could yet turn this into a
success. Whether EMC proves to be a good for Data Domain is another matter.