In all of this talk about autonomous vehicles (AVs) and self-driving trucks, some serious discussion us needed regarding the sheer amount of data required to allow motor vehicles to pilot themselves.
A recent blog post by Kathy Winter, vice president and general manager of the automated driving solutions division at Intel Corp., puts a number on that data demand and frankly to my eyes it’s unfathomably huge: four terabytes per vehicle per DAY.
[How much data is in a “terabyte” you ask? Enough to hold one thousand copies of the Encyclopedia Britannica. And a self-driving car needs FOUR of them.]
But that’s just for the hour and half of daily driving averaged by the everyday motorist, according to Winter: we’re not talking about the autonomous trucks that some in the industry are expecting to operate for far longer stretches per day than that.
“By 2020, that’s the amount of data [four terabytes] that 3,000 individual internet users are expected to put out each and every day,” she noted in her post. “It might not sound like much until you think of it in a different way: How many of us have 3,000 friends on Facebook? Now imagine trying to follow and absorb everything they all post each and every single day.”
[By the by, Winter’s been delving into several different aspects of the self-driving vehicle phenomenon. Go here and here for two other interesting posts on this subject.]
Winter stressed that there’s another interesting twist to the data created by a self-driving car.
“What makes data the new oil for autonomous driving – and what makes it a real challenge – is our need to make sense of that data, to turn it into actionable insight that lets cars think, learn and act without human intervention,” she emphasized. “[That’s] data that lets cars do the driving so that the 90% of the accidents caused by human error may one day be a thing of the past.”
Yet the “exponentially growing size” of the data required for AVs to operate will require in Winter’s words “an enormous amount of computing capacity to organize, process, analyze, understand, share and store” this amount of information.
“Think data center server computing power, not PC [personal computer] power,” she added.
The need to train AVs as quickly as possible presents another challenge, Winter noted.
“When new driving responses or situations are identified, machine learning, simulation and algorithm improvements must happen almost instantly – not weeks or months later – and updated driving models must be pushed to the cars immediately once available,” she said. “When, where and how that happens has implications not just for today, but for the day when self-driving cars are the norm.”
[Frankly, however, a recent survey pours some cold water on that outlook.]There’s also the matter of data protection and what that means for consumers so they will eventually trust what she calls “the autonomous experience.”
[Click here for a good story by our own Aaron Marsh that dives deeper into that critical subject.]
“How we will achieve truly secure storage and sharing of data is a question I am asked about frequently and one we take very seriously,” Winter continued. “Which data gets stored? Which gets tossed? Which data sets get shared? And how will we protect it all? These are valid questions that will require industry collaboration and our best experts to address in a meaningful way.”
Of course, perhaps the biggest overall challenges when contemplating the “four terabyte” needs of AVs is as their population grows into potentially hundreds of millions of them operating worldwide.
“The ability to make this happen comes only through the ability to process increasingly larger data sets,” Winter stressed. She added that “true system scalability” will be critical both inside self-driving cars – “back to that four terabyte number” – and outside of them in “massive data centers,” as the self-driving supercomputer and the cloud that supports it continue to evolve.
Things to think about as we continue to travel down the road to self-piloting vehicles.