Convolutional neural networks used to detect objects in images rely on convolution operations. Although the convolution operation is effective, it requires a regular grid input. Unlike images, point clouds are usually sparse and unevenly spaced on a regular grid. Placing the point cloud on a regular grid will cause an uneven number of points in each cell. Performing the same convolution operation on these grids may result in loss of information in crowded grids or waste calculations in empty grids.
There are currently three point cloud representation methods and corresponding feature learning methods:
- Grids, which are similar to the image 2D/3D convolution after rasterization.
- Sets, the nearest neighbors represented by PointNet to find surrounding points .
- Graph, which transforms the disordered point set into a graph model, and the feature information is transferred through the vertices of the point cloud.
Graph has higher query efficiency than Sets, and its feature extraction capability is higher than Grids. The graph construction time complexity of Graph is O(cN), and the domain query complexity is O(1).
For this goal, they effectively encode the point cloud into a neighboring graph with a fixed radius,then uses GNN to learn the characteristics of each vertex, then predicts the target frame for each vertex, and finally makes the integration of the target frame.
Formally, we define a point cloud composed of N points as a set
Is the point of the three-dimensional coordinate x_i∈R³, and the state value s_i∈ R^k represents the k-dimensional vector of point attributes.
1. Microsoft Azure Machine Learning x Udacity — Lesson 4 Notes
2. Fundamentals of AI, ML and Deep Learning for Product Managers
3. Roadmap to Data Science
4. Work on Artificial Intelligence Projects
They construct a graph G = (P, E) by using P as a vertex and connecting a point with its neighboring points within a fixed radius r.
This mapping method is a Fixed Radius Near-Neighbors problem, which can be found in O(cN) time complexity.
In order to preserve the information in the original point cloud, they encode the dense point cloud as the initial state value s_i of the vertex. More specifically, they search for the original points within the r_0 radius of each vertex and use the neural network on the set to extract their features,then MLP is used for space transformation, and finally Max Pooling is performed on the point dimension to get the initialization characteristic state quantity s_i of this point.
The traditional graph neural network iterates the characteristics of each vertex through edges. At the (t+1)-th iteration:
Where e^t and v^t are the edge and vertex features, respectively, f^t(⋅) calculates the feature of the edge between the two vertices, ρ(⋅) integrates the feature of the edge connected with the point to get the feature increment of this point, g^t(⋅) the point feature increment is integrated with the original feature to obtain the final feature of the point after this iteration.
For edge features, one design method is to describe the force of the domain feature on the position of the point, rewriting the above equation :
In this way, an iterative model of graph neural network is obtained. In addition, this article also points out that because edge features are more sensitive to the distance of domain points, the author proposes to automatically compensate the relative position. Experiments show that it is of little significance:
Specifically, f^t(⋅),g^t(⋅),h^t(⋅) can be modeled by MLP, and ρ(⋅) can be modeled by Max to improve robustness:
For the task of 3D detection, the network head outputs the category of each vertex, the offset of the center of the target frame, and the size and orientation of the target frame. This is basically the same as the traditional Ancho-Free-based 3D target detection.
- Weijing Shi, Ragunathan Rajkumar.Graph Neural Network for 3D Object Detection in a Point Cloud