Channel Access
           FAST ACCESS 
Click on a tab below to view articles within channel topics

     

 

 




 

Telephony


 

White Papers


Vendors - send us your white papers today.

Product Spolight


MasterTrak Plus

Company Tracking Management Systems
Scaled for Small Business

Call today: 800-633-2902

 

21st Century Backup

Join the 21st Century, Dump Data Dedupe for Proper Backup, Archive, and DR

By John Pearring

To get cost savings out of your data deduplication investments you must have data that is duplicated, right? That seems fairly obvious, but data deduplication marketing hyperbole does not care to address how to eliminate data duplication in the first place. If you follow the marketing hype--the more duplication the better.

Duplicated data is basically replicated data, which someone has either copied on purpose, or some 'thing' has copied in the course of moving or managing data. Deduplicating is the method of getting rid of redundant data, so that only one copy of the data is retained on storage media.

The basic concept of data duplication sounds great on the surface: find all duplicate files in the environment, create a single instance of the duplicated file, and have all other instances reference the single copy. This is somewhat easy to implement on a single system, but as you can imagine, it is very difficult to do enterprise-wide.

So, why not go to the source of the problem? What is doing all of that duplicating of data? Numerous processes and activities recreate and copy files, for both good and bad reasons. The primary offender, however, is backup software that is not intelligent enough to know when it already has a file backed up, so it keeps backing it up over and over and over…anywhere from 10 to 52 times.

Dump Dedupe
Data deduplication software only cares about backup. Specifically, it focuses on the not-so-intelligent kind of backup.

To quote an unnamed deduplication guru, "Storage vendors are typically better at finding a need for their technology in your environment rather than finding a technology that will actually meet your needs. Beware of vendor ROI calculators that spew out fantastic dedupe savings as mileage will most certainly vary."

This same dedupe-dealing guru, however, goes on to tell us how to build a case for dedup. In fact, he encourages us not to use the term 'investment' when talking about deduplication technology. He urges us to consider dedupe a forgone conclusion; therefore, dedupe is an 'expenditure.'

In a twist of logic, we are asked to put the same expenditure thinking into deduplication as you do into backup technology. We all know that backup technology is largely an expenditure, because you have to do it. The next rash move is the assumption that all backup technologies will duplicate files like rabbits in a 24x7 springtime frolic. Ergo, all backup technologies need deduplication.

Is that true? Do all backup solutions wildly and randomly copy all of your data all over the place?

No. And, whether your backup solution has a duplication factor of 10 or 52, the cost of dedupe products roughly equals the same cost of your backup solution. Coincidental, isn’t it?

Let’s return to the guru’s advice. "Storage vendors are typically better at finding a need for their technology in your environment rather than finding a technology that will actually meet your needs." I say we switch gears to meet your needs.

Consequently, any backup solution that duplicates, or to put it another way, any backup solution that recommends deduplication products, will then automatically cost twice as much as you expected. Unless you are absurdly wealthy, and nostalgic for all things archaic, that will not meet your needs.

In addition, deduplication products are an additional technology to install, integrate, and manage in your environment. Not only do these deduplication technologies cost the same as your backup solution they will also double the cost of your management, maintenance, support, and long-term planning expenditures.

Stop Duplicating
So, what should you do? Stop duplicating. There are solutions available that back up all the data you need without having to deduplicate. Find a solution where deduplication is not necessary because it does not duplicate data in the first place.

Why would anyone want to buy a backup solution that duplicates your data to the point that you will need to buy deduplication technologies to keep an exploding sea of bits from drowning your IT department? Deduping data created by your backup solution is akin to taking sleeping pills to cancel out an unwarranted overdose of caffeine. Sure, you can do that, but does that make sense, and at what cost?

What you need is a backup solution that offers all the advantages of a new non-duplicating backup solution for which you do not have to pay extra.

What should you look for? The key is the database system design. Insist upon a relational database backup solution that keeps tabs of the files that it has already backed up and can then identify only the new data or versions that needs to be copied. From this baseline such a system will not need to duplicate backups.

Also, for large files, like databases, look for an option that allows block level comparisons in backups, a form of built-in deduplication, which will help to reduce the duplication of object data. Databases change every few seconds or minutes, and a file-based backup will duplicate records within a database file simply because a database file will always require a new backup. Block level deduplication helps in both backup time and storage size by backing up only the changed blocks.

In fact, data deduplication has a place in IT shops. Deduplication that addresses production issues for duplicated and replicated files might be of interest to you. Think of the single instance restore technology now common in email archiving. Attachments common to many emails, for instance, do not need to be duplicated if you have a relational database that can manage pointers. Production is the place to fix duplication. Backup solutions built with 21st century capabilities (relational databases, virtual storage, policy-based architecture, etc.) should not be duplicating files.

Most file systems duplicate system files, and replicated operating systems, even in virtual environments, have duplication offenses. There are solutions out there that provide deduplication options for such production environment realities, similar to the single instance restore indexing now built into email archivers. These solutions are nice, but the actual space savings is only interesting for hundreds and thousands of replicated machines. BTE

John Pearring is VP of customer service of STORServer, Inc., a company founded in 2000 to provide data backup solutions for the mid-market. STORServer offers a complete suite of appliances, software, and services that solve today’s backup, archive, and disaster recovery challenges once and for all. Pearring was previously the president and CEO of STORServer, Inc., and currently serves on the STORServer Board of Directors. 

October2009, Business TechEdge

Home  |  Buyers Guide  |  Privacy  |  Reprints
Rockport Custom Publishing, LLC © 2008