Waymo opens-up data treasure trove for autonomous vehicles

Waymo has pulled back the curtain on valuable datasets to help researchers better hone self-driving algorithms.

While this is a nice gesture from the team, we suspect the lid will be kept shut on further datasets unless the idea becomes more mainstream. Data is king in the world of autonomous vehicles and this could prove to be a valuable bonanza for researchers and application developers throughout the world.

Waymo has said the datasets are not available for commercial use, though researchers in commercial organizations are free to access the data for their own development purposes.

“When it comes to research in machine learning, having access to data can turn an idea into a real innovation,” the team said in a Medium post.

“This data has the potential to help researchers make advances in 2D and 3D perception, and progress on areas such as domain adaptation, scene understanding and behaviour prediction. We hope that the research community will generate more exciting directions with our data that will not only help to make self-driving vehicles more capable, but also impact other related fields and applications, such as computer vision and robotics.”

When you look at the development of autonomous vehicles, nothing is more valuable than the right data, and those who collect it are usually very protective. Part of the reason for this is the effort which must be exerted to collect it, with companies like Waymo clocking up millions of miles on the road.

This release contains data from 1,000 driving segments, each capturing 20 seconds of continuous driving, corresponding to 200,000 frames at 10 Hz per sensor. Each segment contains sensor data from five high-resolution Waymo lidars and five front-and-side-facing cameras, offering a 360° view, as well as a total of 12 million 3D labels and 1.2 million 2D labels.

Such data would allow researchers to train models to track and predict the behaviour of other road users, as well as simulate certain situations to find the most appropriate outcome. The dataset covers various environments, from dense urban to suburban landscapes, as well as during day and night, at dawn and dusk, in sunshine and rain.

What is worth noting, as while this is the largest release of data for autonomous vehicles, it is not the first. Lyft released data last month, and Argo AI did so the month before.

The more data which is released to researchers, the quicker the autonomous dream can be realised, and the safer the final product will actually be. It does technically lessen the commercial edge of these organizations, but the final goal of getting autonomous vehicles on the road sooner rather than later seems to be more valuable.

Waymo learning from Darwin for autonomous driving

Google subsidiary Waymo has been working alongside its AI cousin DeepMind to develop a technique called ‘Population Based Training’, based on Darwin’s concepts of evolution.

Although we plan on dumbing down the explanation here, we do also hope to remain true to the work Google’s autonomous driving subsidiary Waymo and AI unit DeepMind are doing to advance self-driving algorithms. It’s an incredibly complicated field, but it does seem like the duo is making progress.

“Training an individual neural net has traditionally required weeks of fine-tuning and experimentation, as well as enormous amounts of computational power,” a blog post stated. “Now, Waymo, in a research collaboration with DeepMind, has taken inspiration from Darwin’s insights into evolution to make this training more effective and efficient.”

The easy part of autonomous driving is almost finished. Sensors are almost up-to scratch and prices will come down quickly when economies of scale kicks in, while the chip giants are making progress also. The trickiest part of the equation is the ‘intelligence’ aspect, the AI components which control all of the decisions.

The simplest way to explain training algorithms is through trial and error. The algorithm performs a task, then grades its performance depending on the outcome. Depending on the ‘grades’ the algorithm will adjust how it performs the task to create a more likely positive outcome.

The challenge which engineers and data scientists face is how much freedom the algorithms are given to adjust with each trial. Too little variance and the fine-tuning takes too long, too much and the results vary wildly. Most of the time, engineers will monitor the tests, manually culling the poorest performing results.

The new approach from Waymo and DeepMind is an interesting one. Population Based Training starts with multiple different tests, before the poorest performing ones are culled from the population. Out of the ‘survivors’, copies are made with slightly mutated hyperparameters. This process goes on and on until the algorithms become more reliable, resilient and safe.

It might sound like a simple solution, but not many companies like Waymo are fortunate to have such smarts as DeepMind living in the same corporate family. Its almost unfair, and we’ve quite surprised its taken so long for Waymo to cosy up to its smarter cousin.