Carnegie Mellon University (CMU) has developed a new system for training self-driving cars to accurately track the movement of pedestrians, bicycles and other vehicles, providing more data for better results.
Most autonomous vehicles navigate via the use of LiDAR sensors, which provide 3D information about the surrounding environment in the form of a point cloud. One method of interpreting this data is a technique known as scene flow, which involves calculating the speed and trajectory of each point in the cloud. Groups of points moving together are interpreted as vehicles, pedestrians or other moving objects.
Traditionally, training these systems has required labelled datasets, in which each 3D point is tracked over time. Manually labelling these datasets is laborious and expensive, and as a result there is relatively little real-world training data available. Scene flow training is instead often performed with less effective simulated data, and then fine-tuned with the small amount of labelled real-world data that exists.
The team at CMU took a different approach, using unlabelled data to perform scene flow training. Large amounts of unlabelled data is available, as it is relatively easy to generate by simply mounting a LiDAR sensor on a car and driving around. In order to make this approach effective, the team had to develop a way for the system to detect its own errors in scene flow.
At each instant, the system attempts to predict the trajectory and speed of each 3D point. In the next instant, it measures the distance between the point’s predicted location and the actual location of the point nearest that predicted location. This distance forms one type of error to be minimized.
The system then reverses the process, starting with the predicted point location and working backward to map back to where the point originated. At this point, it measures the distance between the predicted position and the actual origination point, and the resulting distance forms the second type of error. The system then works to correct both types of error.
The researchers calculated that scene flow accuracy using a training set of synthetic data was only 25%. When the synthetic data was fine-tuned with a small amount of real-world labelled data, the accuracy increased to 31%. When they added a large amount of unlabelled data to train the system using their approach, scene flow accuracy jumped to 46%.
“It turns out that to eliminate both of those errors, the system actually needs to learn to do the right thing, without ever being told what the right thing is,” said David Held, assistant professor in CMU’s Robotics Institute.
“Our method is much more robust than previous methods because we can train on much larger datasets,” commented Himangi Mittal, a research intern working with Held.
The CMU team presented their method at the Computer Vision and Pattern Recognition (CVPR) conference. Their research was supported by the CMU Argo AI Center for Autonomous Vehicle Research, with additional support from a NASA Space Technology Research Fellowship.