Scale challenges impact every player in the digital infrastructure ecosystem. The only sustainable solution is aligned innovation of the network, the hardware, and the data center together.
This is the first in our End User Summit Top-of-Mind Blog Series. This and the five other blog posts that will follow reflect the top issues discussed during the Infrastructure Masons October 2019 End User Summit in San Jose. Summit attendees included senior-level end user leaders from across the globe, IM Foundation Partner executives, and members of the Diversity & Inclusion (D&I) Committee.
Without fail, when iMasons members are asked “What’s top of mind for you right now?” one of the first answers is “Scale.” (Or talent, which is also a topic in this Winter 2019/20 Top-of-Mind Series.)
Just how big?
The sample size is relatively small, but the individuals in the room at the Summit represented some $100 billion of digital infrastructure spend. Their growth drives the market, to be sure. And regardless of whether the end user group’s prediction or supplier group’s prediction is most accurate, the results are clear: huge growth in digital infrastructure will continue to accelerate well into the future.
Whose responsibility is it?
End users see growth of 7x over the next five years and 39x over the next decade. Suppliers, meanwhile, are predicting growth of 6x and 10x, respectively, over the next 5 and 10 years. The disconnect is illuminating. It points to a top-of-mind issue that permeated all the conversations at the Summit: transparency, and communication, between suppliers and end users.
As one supplier said to the end users in the room, “Could we have some shared responsibility to solve this scale challenge? It’s not just my problem. It’s your problem, too, because volatility in forecasting, volatility in delivery is just risk. It’s financial risk. It’s a health risk. It’s deployment risk. It’s actual business risk. So anything we can do to help, we’re happy to help.”
Digital infrastructure supplier asks end users, “Could we have some shared responsibility to solve this scale challenge?” – Click to tweet
Scale and the network
The end users at the Summit were divided into discussion groups that included a network group, a hardware group, a data center group, and an infrastructure management group. All four talked about scale as a top-of-mind issue.
For the network group, “Across small, medium, and large [companies], our number one challenge is demand forecasts from our stakeholders.” The leader who summarized the group’s discussion explained the challenge of the “ambiguity” or “cloudiness” of demand signals – and the speed at which stakeholders expect those signals to be met. Moving fast in an environment of ambiguity requires giving up some quality “just to make it happen.”
For new networks, she said, “All of us are dealing with a certain amount of ambiguity as far as when we go, where to go, where to place it and how to go fast. Our stakeholders are still trying to decide where they want to put us. And by the time they say go, we’re compressing the crap out of our teams to figure it out. We end up accepting snowflakes – building unique topologies that don’t match our standard. And that comes with its own deficiencies that we may not be able to clean up later on.”
“Across small, medium, and large [companies], our number one challenge is demand forecasts from our stakeholders.” – Click to tweet
Scale and the hardware
Scale is also “creating a major problem” for the hardware group. The leader who summarized the group’s discussion said, “Already we have millions of servers creating challenges for maintenance, power, and cooling.” And the demand from edge infrastructure is going to “dwarf” demand from the cloud, he said. Managing 100x or even 10x growth so it’s sustainable on the power feed, cooling, and building structures will require reinventing “all of those things in some aspects. A lot of innovation has to happen.”
Part of the problem has been the disconnect between hardware innovations and innovations in power and cooling. CPUs and GPUs are increasingly powerful, but the capacity of data centers to power and cool higher performance chips has lagged. “Right now, what happens is the large operators drop the number of servers per rack because they cannot handle the amount of cooling and power that you need to have from a density perspective,” one hardware architect said.
In a typical hyperscale environment, which can fit about 100 servers per rack, “they’ve dropped to 30-40 servers per rack,” he said. “Which means our data centers are going to get more physical space. And the CPU and GPU power is actually pushing us in that direction, which is going to reduce the density per rack even more. So we have to create a lot of innovation around this area. How do you actually increase the density per rack without hurting the blast radius?”
Managing 100x or even 10x growth so it’s sustainable on the power feed, cooling, and building structures will require “a lot of innovation.” – Click to tweet
Scale and the data center
The data center group also focused on the disconnect between the pace of innovations in network and hardware and the pace of innovations in the data center. One leader, an electrical engineer with a hyperscale company, said “Our hardware and network topologies change very rapidly. But then infrastructure modifications to accommodate them have 3-5 year rollouts.” In many cases, she explained, by the time the infrastructure modifications are complete, the hardware or network is outdated.
“So how do we better align network, hardware, and data center infrastructure to keep pace,” she asked, “given the time it takes to actually implement and get value out of it?”
Another member of the data center group suggested a solution: “Ask your network and hardware design teams to build in minimum lifespans” to align the pace of those innovations with the ability of the data center to accommodate them. The hyperscale engineer responded, “That would solve the disconnect but it would also hamstring innovation. I’d be foolish to tell a network or hardware designer ‘you can’t make this improvement.’”
DILBERT © 2004 Scott Adams. Used By permission of ANDREWS MCMEEL SYNDICATION. All rights reserved.
Solving the scale challenge – together
Summit attendees agreed that what got us to the current level of digital infrastructure capacity won’t get us to the next level – be it 6x or 39x, or somewhere in between. “If we’re going from 10 million to 100 million servers, we can’t build data centers for you fast enough,” said one provider. “So what are we doing to increase utilization of our current hardware? And how can we solve this problem without going up 10x the resources that we consume from the communities where we build?”
Indeed, as one member of the infrastructure management group said, “We see resource requirements growing faster due to complexity. But we need to figure out how to grow without using more resources. That’s the future. The technologies that get us there are what we’re all going to be investing in.”
“If we’re going from 10 million to 100 million servers, we can’t build data centers for you fast enough.” – Click to tweet
“In the future we will have to develop a system that’s much more robust so we’re comfortable putting more users and more servers in a rack,” said one member of the hardware group. “30 to 40 servers per rack is under-utilization. If we just doubled that, which all the racks in the world are capable of doing today, then by default on the same footprint we can double our space. If we do the same for power – if you cut the power of the servers somehow magically – then with same footprint we can double again.”
The math is simple enough. “But there has to be some kind of breakthrough innovation that would actually enable us to do this,” said one leader.
Innovation is inevitable. This growth is happening. So, as another Summit attendee put it, “it’s either going to be a crisis that will force change or a forcing function that will make us address underutilization.” One such forcing function, in the mind of a senior hyperscale leader, is climate change, which “will ultimately drive everyone to be much more efficient – and even, to change technologies.”
Check out these posts in the End User Summit Top-of-Mind Blog Series: