Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds

TU Wien
global vs local rotation.

We introduce an SE(3) equivariant convolution operator for local continuous SE(3) equivariant feature extraction without the computational overhead of standard group convolutions.

Ground Truth
Standard Convolutions
Global SE3 Equivariance
Efficient SE3 Equivariant Convolutions

Abstract

Extending the translation equivariance property of convolutional neural networks to larger symmetry groups has been shown to reduce sample complexity and enable more discriminative feature learning. Further, exploiting additional symmetries facilitates greater weight sharing than standard convolutions, leading to an enhanced network expressivity without an increase in parameter count. However, extending the equivariant properties of a convolution layer comes at a computational cost. In particular, for 3D data, expanding equivariance to the SE(3) group (rotation and translation) results in a 6D convolution operation, which is not tractable for larger data samples such as 3D scene scans. While efforts have been made to develop efficient SE(3) equivariant networks, existing approaches rely on discretization or only introduce global rotation equivariance. This limits their applicability to point clouds representing a scene composed of multiple objects.

This work presents an efficient, continuous, and local SE(3) equivariant convolution layer for point cloud processing based on general group convolution and local reference frames. Our experiments show that our approach achieves competitive or superior performance across a range of datasets and tasks, including object classification and semantic segmentation, with negligible computational overhead.

Method

Group Convolutions - Lifting to the Group: transforming input to group elements

Instead of 3D positions the input to the convolution operator are group elements allowing to detect patters transformed by group actions (e.g. rotations).

global vs local rotation.

Given

\[\text{SE(3)} = \mathbb{R}^3 \rtimes \text{SO(3)},\]

the \(SE(3)\) group convolution for 3D point clouds can be written as

\[ \int_{\mathbb{R}^3} \int_{\text{SO(3)}} f(\text{t, R'})k(\text{R}^{-1}(\text{t} - \text{x}), \text{R}^{-1}\text{R'}) d\text{t} d\mu(\text{R'}). \]

In addition to relative 3D positions, relative rotations are also used as input to the kernel resulting in a 6D convolution.

global vs local rotation.
Defining a Grid on \(\text{SO(3)}\)

Solving the group convolution requires defining a grid on \(\text{SO(3)}\), which is not straightforward. Previous work has addressed this by discretizing the \(\text{SO(3)}\) group, for example, using platonic solids. To stay in the continuous space, a random grid can be constructed, such as through Monte Carlo sampling, \[\sum_{j} \frac{1}{\lvert H'_j \rvert}\sum_{(\text{t, R'})\in H'_j} f(\text{t, R'})k(\text{R}^{-1}(\text{t} - \text{x}), \text{R}^{-1}\text{R'}).\]

Yet, the approximation quality of the integral over \(\text{SO(3)}\) depends on the number of samples i.e. the number of \(\text{SO(3)}\) group group elements sampled per point \(|H'_j|\). The memory footprint increases linearly with \(|H'_j|\), while the number of computations increases quadratically.

An example image

Using a random grid results in a trade-off between computational efficiency and preciseness of equivariance property, showing that an efficient grid on SE(3) that allows for exact equivariance with finite rotation elements is crucial to make continuous group convolutions practical for point-based networks.

Efficient \(\text{SE(3)}\) Group Convolutions

To achieve exact equivariance with tractable computational load, we propose a carefully constructed grid \(\mathcal{F}(x_j) \subset \text{SE(3)}\) specific to each point \(x_j \in \mathbb{R}^3\),

\[\sum_{j} \frac{1}{\lvert \mathcal{F}(x_j) \rvert}\sum_{(\text{t, R'})\in \mathcal{F}(x_j)} f(\text{t, R'})k(\text{R}^{-1}(\text{t} - \text{x}), \text{R}^{-1}\text{R'}).\]

We show that if \(\mathcal{F}(x_j)\) is equivariant to \(\text{SE(3)}\), so is our 3D convolution as defined above. \(\mathcal{F}(x_j)\) is called a Frame and consists of only 4 elements for the \(\text{SE(3)}\) group; it can be constructed with local PCA. Further, we propose to perform a stochastic approximation during training by only sampling a subset of the elements of \(\mathcal{F}(x_j)\) for input and output domains of the feature maps; randomly sampling only 1 element will maintain the memory consumption and computations equal to the model with standard convolutional layers.

global vs local rotation.

Results

Our method achieves competitive or superior performance across a range of datasets and tasks, including object classification and semantic segmentation, with negligible computational overhead.

Human Body Parts Segmentation
Memory & Computational Footprint

Using only one sample to approximate the integral over \(\text{SO(3)}\) has approximately similar memory con- sumption and frames per second (FPS) as the non-\(\text{SO(3)}\) equivariant version of our model. This shows that with our method, we can introduce the equivariant property without extra costs, demonstrating the efficiency of our proposed model. .

Semantic Segmentation on ScanNet20

Since our surroundings have a notion of an up orientation, we fix the z-axis and conduct our experiments for \(SO(2)\). We sample only one orientation from the frame for all experiments, which does not pose additional memory or computational burden on the model. This is a crucial property for processing such large point clouds, making it intractable for the other methods to run reasonable-sized networks for this task.

BibTeX


      @article{weijler2025roteq,
        title = {Efficient Continuous Group Convolutions for Local SE(3) Equivariance in 3D Point Clouds},
        author = {Weijler, L. and Hermosilla, P.},
        journal = {International Conference on 3D Vision (3DV)},
        year = {2025},
      }