Enterprise data centers (DC) are in a period of rapid change. More and more data is being stored in public and private clouds. Not only is the amount of storage deployed in the data center growing at an exponential rate, the way providers are buying storage is also changing. Some analysts predict that by 2019, fully half of the amount of storage deployed in enterprise data centers will be fulfilled by data center operators buying hardware directly from ODM’s (Original Design Manufactures, bypassing the familiar OEM suppliers) and in some part assembling systems internally. This change combined with the rapid adoption of flash in the data center makes accurately understanding costs critical to business success.
The Total Cost of Ownership (TCO) of a data center is usually assumed to be relatively well-defined and easy to calculate. However, calculating the data center TCO accurately is a puzzling task. There are many costs that one may under or overestimate, overlook, or not account for since these costs are typically spread across different IT and facility functions and spread out over time. There are many versions of TCO calculators that are available.
TCO calculations are almost compulsory for anyone involved in the DC business. Typically, TCO is measured in $/GB/month and includes all the costs and expenses. So, TCO value is supposed to be the ‘true cost” of running the business vs. such measures as “acquisition cost,” “cost of power,” OPEX, CAPEX, etc.
A more interesting question besides “how to calculate the TCO?” is this: what are the conclusions that result from these calculations? What are the most important DC design requirements that result from these conclusions? For the data centers focused on storage applications more than on compute tasks – what are the most desirable characteristics of storage devices to look for?
An over-simplified process for a DC TCO calculation will look something like this:
Start with the requirement for the Total Effective Storage Capacity needed (say, 10 PB of useful space is needed)
Figure out the Redundancy Multiplier (say, 3 for a system with 3x data duplication or something like 1.4x for the Erasure Coding-based approach)
Now, assume some particular disk drive capacity to be used at this data center (say, 10 TB drives will be used)
Calculate the number N of drives needed to address the above Total Storage Capacity requirement (in this case, 10 PB * 3 / 10 TB = 3000 drives)
“Build” the rest of the calculation around this number…
One can see from the above that DC TCO scales proportionally to the number N – the number of drives needed to address the above Total Storage Capacity Requirement. This might be obvious to some and not as clear to others. Understanding of this fact is critical to making the right design and operational decisions at a DC.
If we consider the most general form of the TCO equation, it looks something like this:
TCO = Cost (Servers + Storage + Hardware Support + Network + Software + Software Support + Administration + Facility + Power + External Bandwidth)
Obviously, the cost of Storage depends on the number of drives N.
The Number of Servers = N / Number of Bays in a Server.
Hardware support obviously scales with N as well.
Cost of Software (licenses) and Software support (installation, updates, fixes, third-party costs, etc.) is strongly linked to the number of servers.
Power cost (including cooling and other overhead) is proportional to the number of devices (as well as their power consumption) and, therefore, also scales well with N.
Facility cost (including maintenance) depends on the size of the DC required to install and operate and support all of the above devices and the total power used by this data center and, thus, also scales with N. This cost is usually measured in $/W or $/sq.ft.
Administration cost (employees) is frequently measured “per server” or “per GB” and is strongly linked to N.
Even Network and External Bandwidth costs are related to the number N, since they both should be adequate for the amount of data moved internally and externally, which correlates with the Total Effective Storage Capacity.
Therefore, we can observe from the above information that the cost of storage, servers, networking bandwidth and gear, software, software support, administration, facilities, etc. will scale roughly proportionally to the number of drives N at the data center. Thus, TCO is strongly dependent on N.
Now, it is time for some analysis. For example, it follows from the above that in order to reduce the DC TCO one needs to reduce N.
How do we do it if we have some well-defined requirement for the Total Effective Storage Capacity?
The answer is clear – get the highest-capacity drives and address your Total Effective Storage Capacity Requirement with as few drives as possible (N —> min).
Or, in other words, maximizing the volumetric storage density of your storage devices.
There are typically some other constraints at play. For example, latency and performance requirements might dictate some minimum threshold on the value of N. However, flash memory is increasingly deployed to provide low latency / high IOPS access as well as high performance to index and metadata so the equation still optimizes back to (N min) for storage purposes. The main conclusion from this extremely simplified analysis still is:
Volumetric storage density of storage devices is THE MOST IMPORTANT FACTOR impacting the DC TCO. At least, for the data centers focused on storage (vs. compute). Typically, even more important than the cost of power.
To better understand this conclusion, think of the following:
If one can only reduce the total power consumption by, say, 20%, this would mean the OPEX savings of 20% of the part spent on power. But the “cost of power” is, usually, on the order of 10% to 30% of the total OPEX. Therefore, reducing 30% by 20% would only help the OPEX by 6%.
And the impact of this cost reduction on DC TCO is going to be even smaller than 6%, since the cost of power is an even smaller fraction of TCO than of the OPEX
On the other hand, if one increases the volumetric storage density by 20%, this means one uses storage devices that have 20% higher storage capacity inside the same form-factor, the domino effect of such a change will be much more significant:
In order to meet the Total Effective Storage requirement, one would now need 20% fewer drives… which would translate into roughly 20% fewer servers, racks, software licenses, support cost, size of the required facilities, and in other major savings.
And, since the amount of hardware and facilities drops by roughly 20% (following the reduction in N), the DC will – as a bonus – consume roughly 20% less in power ANYWAY!
Which means, one is going to get some similar power saving PLUS all the other major savings from buying and maintaining 20% less in terms of drives, other equipment, etc.
In the past, almost universally, the acquisition cost per GB for hard drives and solid-state drives has been declining every year. But the DC owners focus way too much on the acquisition cost and on its desired decline. The true story is this: thanks to the continuous increase in a drive’s storage capacity year over year, the DC owners can store increasingly more and more data inside the same physical volume. Which drives the $/GB TCO of their data centers down much faster than the acquisition cost reduction and faster than almost any other improvement or price reduction they may wish for.
What the data centers really need is a super-high-capacity drive. The drive that can store much more data inside the same dimensions and volume. The drive that is much bigger in capacity and much denser in terms of storage capacity than those we have today. The ”hyper-drive.” For example, Seagate has just announced it’s now shipping in volume its 10TB helium enterprise drives. Such a drive combines two possible major advantages in one product as Helium inside the drive allows for higher storage capacity and lower power consumption simultaneously. And drives like this and larger – both HDD and SSD (like this 60 TB SAS drive) – will be available in the near future driving the DC TCO lower and lower down. Technology that will move us towards this goal faster will be the dominant storage technology in the future data centers. Just don’t forget to always choose the densest storage solution possible.
Author wants to thank Erik Salo (Seagate) for his contribution to this article.