Why IT and Facilities Must Work Closely Together

By Kenneth G. Brill|2022-05-16T17:43:46+00:00July 31st, 2009|0 Comments

When 2% of all failures represent 25% of all losses, management should focus significant attention on reducing the 2%! Major disasters, including extreme weather, leaks, fires, or other catastrophic events cause the majority of DR declarations and are the focus of most business continuity and disaster recovery planning.

However, when looking at actual historical losses of information availability, 25% of information downtime is caused by extremely rare events (2% of the total) caused by momentary losses of site infrastructure availability. This author argues that many of these ‘facility’ malfunctions escalate into failures due to inadequate coordination between IT and Facilities personnel. The escalating environmental demands of today’s rapidly changing IT equipment often outpace the capability and capacity of the existing underlying infrastructure. When IT and Facility personnel work closely together to develop realistic strategies to accommodate increasing demands for power and cooling, information availability and uptime can improve significantly. Information availability is critical to business continuity and this close coordination between Facility and IT departments should play an important role in BC/DR planning.

Funding for business continuity efforts is often limited, with ROI’s difficult to quantify and demonstrate to upper management. However, the time and resources dedicated to this IT/Facility coordination can reduce both capital and operating expenses while reducing risk. New organizational models are emerging for IT and Facilities to work more effectively together.

This Article

  • Provides background on how site infrastructure reliability dramatically affects business continuity and identifies how failures have occurred historically
  • Explains how the economics of Moore’s Law now result in 3-year site operating expenses exceeding the acquisition cost of the servers supported
  • Identifies the need for IT/Facility teams to deal with both business continuity and the changing economics of IT
  • Outlines how these efforts can improve availability and reduce business continuity risks, while also reducing future operating and capital expenditures.

Background
A momentary facility interruption which affects all platforms, all databases, and all applications can translate into at least 4 hours of user downtime while IT restarts hardware, recovers databases, and processes forward from the last checkpoint. The downtime can extend beyond the “normal” 4-hours if a previously unknown IT architectural or procedural failure is discovered.

For the last 13-years, the Uptime Institute has been tracking environmental failures.1 The data indicates that site availability failures are never the result of a single factor. At least 5 to 8 things must simultaneously occur to cause a failure. While the probability is low, the cost in information availability, and sometimes excess capital expenditures to prevent future occurrences, can be very high. Based on Institute research, it is possible to reliably predict site failures.

06FAC_p20

While server performance continues to increase exponentially, less obvious is that the power consumed per computer equipment rack or cabinet has also jumped dramatically. The expense of providing and maintaining the physical space, power, cooling, and environmental support has also risen steeply.

The Invisible Consequences of Moore’s Law’s Economic Breakdown
Along with increasing performance, the power consumed per $1,000 of IT hardware investment over the last 6 years is the root cause of escalating data center costs. This dramatic change and its implications are just now becoming fully recognized. The consequences of this change are typically invisible to most “C-level” executives—until the capacity of existing data centers has been consumed. The same dollar spending for new servers today embeds two to four times more power consumption in the same (or less) space than equipment being replaced.

The Five Gold Nuggets
The Institute has identified five things many organizations can do now to reduce power consumption with existing equipment layouts in their data centers. Ten percent is almost assured, and 30+ percent is often achievable without affecting computing performance and without significant new expenditures. However, for a number of legitimate reasons, plus corporate inertia, many organizations will not take the risk of picking this gold up from the computer-room floor without a major push from senior management.

The Five Gold Nuggets are

  • Server consolidation, optimization, virtualization
  • Enabling server power saving features
  • Turning off servers no longer in use
  • Pruning bloated code to allow use of less powerful servers
  • Improving the coefficient of datacenter energy efficiency

When Being Risk-Averse can be Risky
In this list, the first four are IT-driven, while the fifth is Facilities-driven, but requires close participation by IT. None of the five will be accomplished without a serious management push. Today, the technical expertise required to do this work is widely dispersed within the organization and at a fairly low org-chart level, with no one person accountable for savings.

Critical Physical Layer Defined
What is often overlooked is the critical physical layer—the foundation for everything IT does. The IT equipment requires a physical location with power, cooling and other environmental services like fire detection and suppression. Historically, the facility organization provided these “site” infrastructure services with some dividing line between where Facilities stopped and IT started (typically at the Power Distribution Unit). As densities grow, IT and Facilities, to harvest the Five Gold Nuggets, must work together as a boundary-less team.

Many User Organizations Are Not Optimally Structured For The Challenges Ahead
The interdependencies between IT technology decisions and critical physical layer operations are often overlooked or poorly understood. Similarly, corporate real estate executives are puzzled that 30,000 square foot data centers that previously cost $20 million may now cost $100+ million, and this was not included in their budgets. The result is confusion, delay, increased downtime risk, and sub-optimal decisions. The preceding four-quadrant table outlines some of the critical physical layer interests of each user stakeholder.

Best Practice ICE Teams
Harvesting the gold will be much faster with the adoption of a new planning process and functional team approach first explored during the Institute’s 2006 High-Density Computing Symposium called Integrated Critical Environment (ICE) Teams. When properly constituted and empowered, ICE Teams become an essential part of an overall strategy for reducing computer room power consumption and optimizing overall performance.

Business Continuity Benefits of ICE Teams
In addition to the economic benefit of boundary-less cooperation between IT and facilities, there are also significant reliability benefits. IT’s computer room layout choices dramatically affect both zone and vertical hot spots. Simple layout changes can double the amount of hardware that can be cooled consistently which directly supports business continuity by avoiding intermittent ghosts and other reliability problems. Installation of blanking plates and other best practices can also reduce equipment temperatures dramatically. As densities continue to rise, these issues will become more and more important to assure computer hardware receives optimal critical environment conditioning.

This article was published in the Disaster Resource GUIDE for Facilities (Fall 2006).


White papers and supporting studies by The Uptime Institute are available at www.uptimeinstitute.org/whitepapers

Recommend0 recommendationsPublished in Physical Infrastructure

Share This Story, Choose Your Platform!

About the Author: Kenneth G. Brill

Kenneth Brill was the founder of The Uptime Institute and Upsite Technologies, and developed the Tier System that continues as one of the primary measures of data center reliability. Kenneth Brill, one of the data center industry’s thought leaders for decades, passed away on Tuesday.Aug 1, 2013

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.