Data At Risk Multi-Tiered Storage Poses Opportunities and Obstacles for Data Protection

By Jon William Toigo|2022-03-29T19:15:10+00:00January 10th, 2005|0 Comments

In case you haven’t noticed, these trends have breathed life into coping strategies built around multitiered data storage architectures. That is to say, many companies are beginning to leverage inexpensive disk arrays based on Serial-ATA (S-ATA) technology to compliment their primary storage in order to solve three problems:

  1. These secondary repositories (so-called “Tier 2” arrays) are being used by many companies to store older data that needs to be kept on line for regulatory reasons or to host “reference data” that is accessed by users and applications but rarely modified or updated. Such a strategy is seen as a means to reserve expensive, higher performance storage platforms for use with new or frequently changing data.
  2. Many firms are using S-ATA arrays to replace tape subsystems, alleviating the problem of a “shrinking backup window” by capitalizing on much faster disktodisk transfer speeds and providing a mechanism for fast data restores that doesn’t require huge investments in expensive array mirroring.
  3. Some firms are using S-ATA disk arrays as a “cache” for tape: as previously mentioned, the arrays provide a higher-performance target than a tape subsystem for receiving backup streams (the disk emulates a tape device), so backups are completed more quickly. The recorded data stream is then moved from the Tier 2 disk cache to a back-end tape subsystem as a separate process that does not interfere with normal production processing.

These may all seem to be perfectly appropriate uses for cheap disks that respond well to a given company’s storage requirements and constrained budgets. However, a multitiered storage strategy also presents all of the makings for a data disaster – especially if these socalled “ghetto RAID” products are purchased and deployed without consideration for data protection requirements.

How Safe Is Your Data?
One problem is that many vendors of S-ATA arrays ship their products with “bare bones” controllers or “dumbed down” host bus adapters (HBA). A controller is the component of an array that is used to connect the array to a server, whereas the HBA is the component installed in a server (or increasingly as part of the server motherboard) to which the storage device is cabled. A large part of the cost of a high end array is linked to the engineering work that has gone into customized array controllers, and a well-engineered HBA can also add cost to a sturdy solution. To keep prices down, many commodity S-ATA arrays are sold without either component.

Logic dictates that any platform entrusted with your most irreplaceable asset, data, should offer at least a minimal set of protective capabilities for use in safeguarding that data. These capabilities, at a minimum, should include some form of RAID and some sort of networkbased manageability. While they may add cost to the overall platform, they are costs well justified by the improved protection they afford your data.

RAID (or Redundant Array of Independent Disks) is a set of quasistandards that came out of the University of California at Berkeley in 1977. The Berkeley researchers identified five strategies (called “levels”) for aggregating disks into “arrays” and for providing internal safeguards against the loss of data in the event of a disk drive failure within the array. Of these five RAID levels, RAID 5 is arguably the most popular. Basically, as data is written to the array, a calculation – called a parity check – is performed and the results are written across all of the disks in the array as the data itself is written to disk. This distributed parity information can be used to reconstruct data if a specific drive fails. Striping parity across the disks is a kind of statistical shell game for data protection that works most of the time. (For more information about RAID levels, you can consult any number of web sites or my book The Holy Grail of Data Storage Management.)

The number of RAID levels has grown over the years, with vendors offering proprietary RAIDing schemes to fit their own controller designs. Perhaps the most annoying development has been the creation of RAID 0 – which, in opposition to the stated goals of the RAID scheme developers at Berkeley, offers no data protection at all. Many S-ATA arrays are RAID 0 boxes, or more simply, JBODs (Just a Bunch of Disks).

The bottom line is that wedding low cost arrays to a RAID 5 controller or HBA can improve the internal protection afforded to the data stored on its disk platters. As a matter of best practice, a RAID controller (other than “0”) should be considered a must have when considering an S-ATA array acquisition. It will increase the cost of the platform, but nowhere close to the cost of a highend “big iron” array, to be sure.

How Smart Is Your Storage?
The second factor that needs to be considered when selecting S-ATA or other lowcost array options is how well these platforms fit within a disciplined storage management scheme. This issue is important on many levels, and not just as it pertains to data protection. Effective management is key to constraining storage cost of ownership: the more management smarts in the storage infrastructure, the fewer the number of staff required to administer it. Given that management costs account for 60 to 70 percent of the total cost of ownership in data storage technology, it is clear that wellmanaged storage costs less over the long haul.

From a data protection standpoint, management affords two additional benefits. For one, effective management enables the proactive response to burgeoning failures in the storage infrastructure. Good management systems will advise you when disks are close to becoming full, so that data can be redistributed before a “disk full” error leads to expensive business application downtime. A good management system can also let you know in advance if disks are overheating or certain equipment is functioning at sub par levels.

Storage management software tools themselves are changing – moving up the stack, so to speak, to provide more functionality than simplistic trending, monitoring and alarming. Increasingly, vendors are seeking to automate certain storage functions, such as backup, data migration, and data replication. The real holy grail is to reach the point of storage selfmanagement – what you may hear called data lifecycle management, information lifecycle management or ILM. While the best storage management products are succeeding in automating certain data protection functions, true ILM should include a data naming and tagging scheme, an access frequency counter, a mechanism for characterizing storage by cost and capability, and a policy engine that uses the preceding three components to place data correctly in the infrastructure based on its storage requirements, access characteristics and platform capabilities and costs. None of the ILM products currently sold into the distributed computing space have all of these components.

Policy-based data movement automation is a key to protecting data on a budget. It is also a prerequisite for multitiered storage architectures to work. The problem is that storage components themselves must be equipped or “instrumented” to play in such a management scheme. Many low cost arrays are not.

To address this issue, planners need, first, to select a storage management framework for use in their environments, and, then, to purchase only platforms that are instrumented to work with and to be automated by this framework. The first requirement may be a difficult one to fulfill. Not only are business continuity and disaster recovery planners rarely involved in the decisionmaking around storage management software, the marketing around storage management products is often equal parts hyperbole and out-and-out misinformation. Lacking any real standards, management approaches vary widely among vendors and many work only with a hardware vendor’s own gear. “Platform independence” remains a knotty issue with regard to the management of heterogeneous storage environments.

It should also be mentioned that some work is going on within the American National Standards Institute (ANSI) and the Internet Engineering Task Force (IETF), as well as quasistandards groups such as the Desktop Management Task Force (DTMF) and the Storage Networking Industry Association (SNIA), to develop standardized management approaches. Planners need to watch these efforts and strategize ways to leverage open standards once they are fully baked.

Conclusion
A third group of vendors to watch are the server operating system vendors. Microsoft, for example, is poised to coopt much of the functionality in the current crop of storage management software products. Doubtless, this will be mimicked by UNIX and Linux OS vendors in short order. Managing storage at the server operating level might help to make the multi-tier storage infrastructure a safer place for your data to hang out.


About the Author
Jon William Toigo is a technology consumer advocate, veteran disaster recovery planner, and author of 15 books, including Disaster Recovery Planning 3rd Edition: Preparing for the Unthinkable. He is CEO of Toigo Partners International and founder of the Data Management Institute. For more information contact Jon Toigo at [email protected] or visit www.drplanning.org. ©2004 by Toigo Partners International LLC. All Rights Reserved.

Recommend0 recommendationsPublished in IT Availability & Security

Share This Story, Choose Your Platform!

About the Author: Jon William Toigo

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.