DRIVE Labs: Pursuing Perfection for Intersection Detection

The WaitNet deep neural network identifies intersections without the help of a map.
by Neda Cvijetic

Editor’s note: This is the second post in the NVIDIA DRIVE Labs series. With this series, we’re taking an engineering-focused look at individual autonomous vehicle challenges and how the NVIDIA DRIVE AV Software team is mastering them. Catch up on our first post, on path perception, here.

MISSION: Intersection Detection Using AI-Based Live Perception

APPROACH: WaitNet Deep Neural Network

Navigating a traffic-light controlled intersection may seem routine. But when the NVIDIA BB8 autonomous test vehicle first performed that task last year, it had our engineers smiling.

With good reason. Using only our AI-based live perception deep neural net, the vehicle was able to detect, stop at and proceed through the intersection. And it did so with pinpoint accuracy.

Our approach used AI-based scene understanding to perceive and classify the intersection in real time. Rather than seeking to detect and piece together individual features — stop signs, traffic lights, lane markings, etc. — into evidence of an intersection, we accomplished scene-based detection and classification using our WaitNet deep neural network.

WaitNet is named for its mission: detecting conditions in which an autonomous vehicle must come to a stop and wait. It’s a convolutional DNN trained on camera image data to infer various kinds of wait situations — such as intersections, construction zones and toll booths — and to classify them.

These results can then be used as input to higher layer autonomous vehicle software modules, such as mapping and behavior planning components.

Similar to Human Detection

WaitNet’s intersection detection and classification process is similar to how a human detects an intersection. The overall scene consists of numerous features rather than individual indicators, such as a stop sign, a traffic light or the presence of lane markings in unusual locations or positions. The scene is simultaneously perceived by our visual system to detect the presence (or absence) of an intersection, as well as its type.

Scene-based intersection detection using our WaitNet DNN (visualized in yellow). Traffic light detection visualized in purple.

The virtue of using AI for scene-based perception? It removes the need to manually determine which visual features are or aren’t relevant to overall intersection perception, as well as the need to hard-code rules for combining them for each intersection type. The complexity of such a case-by-case brute force approach doesn’t scale well — there are just too many kinds of intersections in the world.

Moreover, by not over-relying on individual features, we mitigate potential propagation of a feature-level detection error, which could, for example, turn failing to detect a stop sign into failing to detect the entire intersection. And by not exclusively relying on a map that specifies the intersection location or type, we mitigate vulnerability that may be introduced by an incomplete or erroneous map.

Shipping in Next DRIVE Software Release

WaitNet’s ability to detect both an intersection and the distance to it has gone from an internal development project to software that will be shipping in the weeks ahead as part of the NVIDIA DRIVE Software 9.0 release.

Using the DRIVE Hyperion kit — which is NVIDIA’s self-driving car sensor and compute platform — and WaitNet-based perception, we can detect most intersections up to 150 meters away. This enables the use of increased detection range for the car to brake comfortably as it approaches the intersection.

For close-range intersections, WaitNet outputs accurate detection of intersection stopping point locations. This is particularly valuable in semi-urban and urban settings in which GPS signal accuracy tends to degrade due to multipath effects.

WaitNet-based features have also been slated for future software releases, including the ability to detect multiple intersections per image frame, traffic lights and traffic signs, and more.

The ability to use scene and context understanding to navigate intersections via AI-based live perception adds a critical layer of robustness to DRIVE Software, helping us achieve a better, safer self-driving experience.