Nowadays in freight transportation circles it’s common to hear a lot of talk about how artificial intelligence (AI), the Internet of Things (IoT) and (of course) the advent of driverless trucks will transform the process of hauling cargo from point A to point B across the U.S. and the world as a whole.
But at the root of such technological wonders (and many don’t necessarily consider them “wonderful” for some pretty solid reasons) lie reams upon reams of data, which, when plugged into a host of complicated algorithms, should allow machines to perform tasks only humans can currently do – such as driving big rigs.
Not only are there concerns regarding the accuracy of the data being used to “power” intelligent machines and the like, but also whether the data used to program said machines will help them make the correct decisions.
A recent survey of 179 data scientists by AI platform developer CrowdFlower recently put those worries into starker perspective.
First, 90% of those responding to CrowdFlower’s poll predict they will have more data to contend with in 2017, with zero believing this volume of data will decline. On top of that, slightly more than half of the data scientists surveyed (51%) said they are spending a significant amount of time working with “unstructured datasets” that includes images (33%) and video (15%) as well.
That’s an issue because, according to technology research firm Gartner Inc., unstructured video and image data derived from a “proliferation” of cameras and sensors (and oh does trucking know about that!) is expected to exceed 80% of all internet traffic by 2019.
On top of that, by 2020, Gartner predicts that 95% of video/image content will never be viewed by humans but will have been analyzed by machines.
Then there is the growing issue with low-quality “training data” for AI systems – so-called because is the data used to “train” machines how to perform human-like tasks.
“Without data, algorithms are useless; like a meal with only forks and spoons but no food,” noted one respondent to CrowdFlower’s survey; illustrating just how critical high-quality “clean data” is to meeting all of the expectations being set for “thinking machines.”
CrowdFlower describes the issue this way: as AI systems increasingly enter the mainstream, their usefulness is often defined by the quality of the training data used.
While a machine can process complex mathematical equations or structured data in milliseconds, training data teaches a machine how to process more abstract data like flagging inappropriate content or distinguishing between objects in images.
Yet while higher quality initial training data improves the accuracy of an algorithm's initial output, ongoing training data is required to constantly improve upon the algorithm's results.
CrowdFlower’s poll also found that the “biggest bottleneck” to successfully completing AI projects for over half of the data scientists it polled relates to “getting good quality training data” or “improving the training dataset,” while another 30% stumble when trying to deploy their machine learning model into production.
"There is a tremendous amount of hard work that is needed to make an AI system deliver on its promise and at the core is getting the training data right," noted Robin Bordoli, CEO of CrowdFlower, in the firm’s report.
"Cleaning, labeling and categorizing data isn't sexy or fun, but it's critical,” Bordoli added. “Data scientists know it and that's why they are spending the bulk of their time doing the work they hate. The reality is that algorithms are far from perfect, however, with higher quality training data – created by human intelligence – we can generate business value even with these imperfect algorithms."
But that’s just for starters. The REALLY big worry regards “ethical situations” machines will face, for they will respond based on the data programmed into them.
“AI and ethics is an issue that bears close watch in the coming years,” Bordoli stressed.
While the potential of AI replacing human-staffed jobs is an issue according to 42% of respondents, the larger concern in their eyes is the impact of human bias in training data: More than 63% of those surveyed said that they are concerned that human bias and prejudices, such as those revolving around race, religion or demographics, will corrupt the data used to teach AI systems.
Another 42% express skepticism that we can avoid the programming of biases and are concerned about the “impossibility of programming a commonly agreed upon moral code.”
Here are few other “ethical” worries regarding data and thinking machines:
- The use of AI and automation in warfare/intelligence is a major concern of nearly half the data scientists polled (49%).
- Unease on the displacement of human workforces with machines (41%)
- Self-driving vehicle safety issues (21%)
Note that last item: the concerns regarding data and self-driving vehicles makes questions regarding “machine moral codes” front and center concerns for trucking. And figuring out how the industry comes to grip with them won’t be easy.