Introduction to Apollo Perception Module

In last article, the general data processing chain inside a Cyber Component is introduced in detail. We know that the customized iniitalization and data processing of a perception component (for example the Lidar detection component) will be implemented in its Init() and Proc() member functions. In this article I would like to make the perception process clear, but won't illustrate the algorithm details.

People will want to see where are the deep learning models being invoked in perception at first, actually there are several levels till it appears. Like the last article, I will begin from a dag file where to launch the perception. Under /modules/perception/production/dag path there are several dag files which will combine different perception components into a module. Each perception component class is a child class of Cyber::Component. All perception components are defined under /modules/perception/onboard path. Here will take dag_streaming_perception_lgsvl.dag as example, which includes the following perception components:

  • LidarDetectionComponent
  • LidarTrackingComponent
  • RadarDetectionComponent
  • MultiSensorFusionComponent

And at the time of writing, the code structure of the perception process for different sensors has not been unified yet, especially radar perception part. I will address them separately.

Init() of a perception component

I. Lidar perception component

First, take a look at the structure of LidarDetectionComponent class:

alt text

From the config section of LidarDetectionComponent in the dag file, can see the config file path for this component:
config_file_path: "/apollo/modules/perception/production/conf/perception/lidar/velodyne128_detection_conf.pb.txt" This file contains the following config items:

1
2
3
4
5
6
7
8
sensor_name: "velodyne128"
detector_name: "LidarObstacleDetection"
enable_hdmap: true
lidar_query_tf_offset: 0
lidar2novatel_tf2_child_frame_id: "velodyne128"
output_channel_name: "/perception/inner/DetectionObjects"
lidar_detection_conf_dir : "/apollo/modules/perception/pipeline/config"
lidar_detection_conf_file : "lidar_detection_pipeline.pb.txt"

In LidarDetectionComponent::Init(), it will do: 1. Load the config items above with GetProtoConfig() into the member variables related to config info 2. Create an Writer<LidarFrameMessage> object writer_ 3. GetProtoFromFile() loads configs from lidar_detection_conf_file : "lidar_detection_pipeline.pb.txt" into lidar_detection_config_ 4. Create an LidarObstacleDetection object lidar_detection_pipeline_ 5. Initialize lidar_detection_pipeline_ with lidar_detection_config_ from step 3

Let's take a look at the file "lidar_detection_pipeline.pb.txt":

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
pipeline_type: LIDAR_DETECTION

stage_type: POINTCLOUD_PREPROCESSOR
stage_type: POINTCLOUD_DETECTION_PREPROCESSOR
stage_type: MAP_MANAGER
stage_type: CNN_SEGMENTATION
stage_type: POINTCLOUD_DETECTION_POSTPROCESSOR
stage_type: OBJECT_BUILDER
stage_type: OBJECT_FILTER_BANK

.....
stage_config: {
stage_type: CNN_SEGMENTATION
enabled: true

cnnseg_config: {
sensor_name: "velodyne128"
param_file: "/apollo/modules/perception/production/data/perception/lidar/models/cnnseg/cnnseg128_param.conf"
proto_file: "/apollo/modules/perception/production/data/perception/lidar/models/cnnseg/cnnseg128_caffe/deploy.prototxt"
weight_file: "/apollo/modules/perception/production/data/perception/lidar/models/cnnseg/cnnseg128_caffe/deploy.caffemodel"
engine_file: "/apollo/modules/perception/production/data/perception/lidar/models/cnnseg/cnnseg128_caffe/engine.conf"
}
}
.....

It defines 7 stages, these stage objects will be created during the initialization process:

1
2
3
4
5
6
7
LidarDetectionComponent::Init()
-> LidarDetectionComponent::InitAlgorithmPlugin()
-> lidar_detection_pipeline_->Init(lidar_detection_config_)
-> Pipeline::Initialize(pipeline_config)
-> for loop with Pipeline::CreateStage(stage_type)
-> stage_ptr->Init(stage_config_map_[stage_type])

Each stage has its own class definition and is a child class of Stage. In this example config file, CNNSegementation is configured as the kern detector. Its Init() will prepare the model: 1. Load configs from its stage_config section in "lidar_detection_pipeline.pb.txt" file, which includes 4 files:

1
2
3
4
param_file: "/apollo/modules/perception/production/data/perception/lidar/models/cnnseg/cnnseg128_param.conf" \
proto_file: "/apollo/modules/perception/production/data/perception/lidar/models/cnnseg/cnnseg128_caffe/deploy.prototxt" \
weight_file: "/apollo/modules/perception/production/data/perception/lidar/models/cnnseg/cnnseg128_caffe/deploy.caffemodel" \
engine_file: "/apollo/modules/perception/production/data/perception/lidar/models/cnnseg/cnnseg128_caffe/engine.conf" \
2. From the param_file, assign values to a couple of feature member variables : range_, width_, height_, min_height_, max_heigt_, output_names, input_names. Here is part of the content of param_file "cnnseg128_param.conf":
1
2
3
4
5
6
7
8
9
.....
model_type: "RTNet"
.....
ground_detector: "SpatioTemporalGroundDetector"
roi_filter: "HdmapROIFilter"
remove_ground_points: true

network_param { ... }
feature_param { ... }
3. Create an reference object inference_ with model_type config, here is RTNet 4. Initialize the inference object inference_, which will create the network and input&output blobs with TensorRT APIs 5. Initialize the feature generator feature_generator_, the features are the input of the inference network

Besides RTNet, Apollo provides also CaffeNet, PaddleNet, TorchDet, OnnxObstacleDetector and MINet network classes, they are all inherited from inference class, and all of these classes have the kern member variables blobs_ input_names_ output_names_ for the model network input and output and the kern member function infer(). The function CreateInferenceByName works as an inference factory, it creates an corresponding inference object by the model type config. At the time of writing this article, the other lidar detector classes CenterPoint MaskPillarsDetection PointPillarsDetection NCutSegementation and the radar detector class ContiArsDetector are not using this function yet.

II. Radar perception component

Unlike LidarDetectionComponent class, RadarDetectionComponent class doesn't have pipeline member variable from Pipeline class which contains a couple of stages.

alt text

In RadarDetectionComponent::Init(), it will create and initialize the following 3 member variables:

1
2
3
4
5
6
1. hdmap_input_->Init()
2. radar_preprocessor_->Init()
3. radar_perception_->Init(pipeline_name_) which includes:
detector_->Init()
roi_filter_->Init()
tracker_->Init()

There are two config sections of RadarDetectionComponent in the dag file: "FrontRadarDetection" and "RearRadarDetection". Take the first one as example, can see the config file path for this component:
config_file_path: "/apollo/modules/perception/production/conf/perception/radar/front_radar_component_conf.pb.txt"
This file contains the following config items:

1
2
3
4
5
6
7
8
9
radar_name: "radar_front"
tf_child_frame_id: "radar_front"
radar_forward_distance: 200.0
radar_preprocessor_method: "ContiArsPreprocessor"
radar_perception_method: "RadarObstaclePerception"
radar_pipeline_name: "FrontRadarPipeline"
odometry_channel_name: "/apollo/localization/pose"
output_channel_name: "/perception/inner/PrefusedObjects"

The item radar_pipeline_name: "FrontRadarPipeline" tells the info about the radar detection pipeline, this file is at /apollo/modules/perception/production/conf/perception/radar/front_radar_pipeline.config. The file is as following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
model_configs {
name: "FrontRadarPipeline"
version: "1.0.0"
string_params {
name: "Detector"
value: "ContiArsDetector"
}
string_params {
name: "RoiFilter"
value: "HdmapRadarRoiFilter"
}
string_params {
name: "Tracker"
value: "ContiArsTracker"
}
}

III. Lidar tracking component and Multi-Sensor fusion component

These two components have the same structure and initialzation process as Lidar detection component, here will not illustrate again.

Proc() of a perception component

The implementations of Proc(message) member function for the above perception component classes have similar process, it will basically do the following two things:

  1. InternalProc(in_message, out_message)
  2. writer_->Write(out_message)

As LidarDetectionComponent LidarTrackingComponent and MultiSensorFusionComponent have the similar pipeline and stage structure, their InternalProc() are also having similiar processes insdie:

1
2
3
4
InternalProc(in_message, out_message)
-> lidar_detection_pipeline_->Process(data_frame)
-> Pipeline::InnerProcess(data_frame)
-> for loop: stage_ptr->Process(frame)
In our example case, the lidar detector CNNSegmentation is a stage in lidar detection pipeline, its Process() executes the following steps:
1
2
3
4
5
CNNSegmentation::Process(DataFrame* data_frame)
-> Detect(options, lidar_frame)
-> feature_generator_->Generate(original_cloud_, point2grid_) // generate features
+ inference_->Infer() // model inference
+ GetObjectsFromSppEngine(&frame->segmented_objects) // processing clustering

Other lidar detectors CenterPoint MaskPillarsDetection PointPillarsDetection have similar steps in Process():

1
2
3
4
5
CenterPointDetection::Process(DataFrame* data_frame)
-> Detect(options, lidar_frame)
-> DoInference(points_data, num_points, &out_detections, &out_labels, &out_scores)
+ FilterScore(&out_detections, &out_labels, &out_scores, FLAGS_score_threshold, &out_detections_final, &out_labels_final)
+ GetObjects(&frame->segmented_objects, frame->lidar2world_pose, &out_detections_final, &out_labels_final)
This pipeline and stage structure provides a very flexible way to customize the perception process, user can easily add or delete or replace one stage in the pipeline. But for RadarDetectionComponent, its current InterProc() of RadarDetectionComponent does in a different way as following:
1
2
3
4
5
6
1. radar_preprocessor_->Preprocess(raw_obstacles, preprocessor_options, &corrected_obstacles)
2. radar_perception_->Perceive(corrected_obstacles, options, &radar_objects) which includes:
detector_->Detect(corrected_obstacles, options.detector_options, detect_frame_ptr)
roi_filter_->RoiFilter(options.roi_filter_options, detect_frame_ptr)
tracker_->Track(*detect_frame_ptr, options.track_options, tracker_frame_ptr)

To understand what the pipeline does and the data stream through each stage, it's necessary to take a look at the input and output data types. From the InternalProc function interface can see the input and output message data types: - LidarDetectionComponent: bool InternalProc(const std::shared_ptr<const drivers::PointCloud>& in_message, const std::shared_ptr<LidarFrameMessage>& out_message) - LidarTrackingComponent: bool InternalProc(const std::shared_ptr<const LidarFrameMessage>& in_message, const std::shared_ptr<SensorFrameMessage>& out_message) - RadarDetectionComponent: bool InternalProc(const std::shared_ptr<ContiRadar>& in_message, std::shared_ptr<SensorFrameMessage> out_message) - MultiSensorFusionComponent: bool InternalProc(const std::shared_ptr<SensorFrameMessage const>& in_message, std::shared_ptr<PerceptionObstacles> out_message, std::shared_ptr<SensorFrameMessage> viz_message)

So LidarDetectionComponent takes PointCloud data as input, and outputs LidarFrameMessage data type which contains LidarFrame type data:

1
2
3
4
5
6
7
8
9
10
11
12
struct LidarFrame {
std::shared_ptr<base::AttributePointCloud<base::PointF>> cloud;
std::shared_ptr<base::AttributePointCloud<base::PointD>> world_cloud;
double timestamp = 0.0;
std::shared_ptr<base::HdmapStruct> hdmap_struct = nullptr;
std::vector<std::shared_ptr<base::Object>> segmented_objects;
std::vector<std::shared_ptr<base::Object>> tracked_objects;
base::PointIndices roi_indices;
base::PointIndices non_ground_indices;
base::PointIndices secondary_indices;
base::SensorInfo sensor_info;
}
LidarDetectionComponent actually outputs to the segmented_objects in the LidarFrame. LidarTrackingComponent outputs to the tracked_objects in the LidarFrame first and transfers to another general data type SensorFrameMessage to hold the output, for the next step fusion. SensorFrameMessage data type contains Frame type data, and its kern is objects i.e. object info:
1
2
3
4
5
6
7
8
9
10
11
struct alignas(16) Frame {
SensorInfo sensor_info;
double timestamp = 0.0;
std::vector<std::shared_ptr<Object>> objects;
Eigen::Affine3d sensor2world_pose;
// sensor-specific frame supplements
LidarFrameSupplement lidar_frame_supplement;
RadarFrameSupplement radar_frame_supplement;
CameraFrameSupplement camera_frame_supplement;
UltrasonicFrameSupplement ultrasonic_frame_supplement;
}
RadarDetectionComponent is a little bit different on its input data type. ContiRadar is the raw message getting from the Conti radar sensor, it can be found at modules/common_msgs/sensor_msgs/conti_radar.proto file. Notice it doesn't contain the raw point cloud data like Lidar detection input, but contains an obstacle array ContiRadarObs direct from radar. After checking the kern function of the radar detector ContiArsDetector::Detect(), it actually does the converting of the obstacle info to Frame data format via invoking RawObs2Frame().

The output data type PerceptionObstacles from MultiSensorFusionComponent can be found in perception_obstacle.proto file, it's the final output from the perception module.