Skip to content

Apoorv-1009/ObitoNet

Repository files navigation

ObitoNet, ArXiv

Overview

This repository contains the code and approaches for ObitoNet, a multimodal masked point cloud reconstruction model using a Transformer based approach. We effectively fuse image features and point cloud features with a Cross-Attention mechanism. Our approach leverages Vision Transformers (ViT) to extract rich semantic features to create input tokens. A Point Cloud tokenizer generates point cloud tokens utilizing Farthest Point Sampling (FPS) and K-Nearest Neighbors (KNN) and captures local geometric details. These multimodal features are combined using a learnable Cross-Attention decoder module, that reconstructs high-fidelity point clouds.

A link to the video showcasing ObitoNet can be found here.
A supporting repository can be found here

image

Components

  1. Point Cloud Encoder: Processes 3D point cloud data into meaningful tokens using Transformers.
  2. Cross Attention Decoder: Combines encoded data for accurate reconstruction and representation.
  3. Image Encoder: Planned component for processing 2D image data in conjunction with point clouds.

Install

PyTorch = 1.7.0 < 1.11.0; python = 3.10; CUDA = 11.8;

# Chamfer Distance
cd ./extensions/chamfer_dist
python setup.py install --user

# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

# GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

Training

To train ObitoNet, run the following command. There's are a recommended training order. To learn more about training order and other experiments, please refer to our paper.

CUDA_VISIBLE_DEVICES=<GPU> python main.py --config configs/config.yaml --exp_name <output_file_name>

Visualization

To visualize the Point Clouds run:

python visualization.py --test --config configs/config.yaml--exp_name <name>

Training (Nexus Cluster - UMD)

To run the terminal interactively:

bash-4.4$ srun --gres=gpu:rtxa5000:4 --account=class --partition=class --qos high -t 1-00:00:00 --mem-per-cpu=64gb --pty bash -i

Dataset Prep

The model was developed using a modified version Tanks and Temples dataset (link). We are hosting our modified dataset generated for this particular model here. This uses sampling to generate a set of fixed size point clouds to train the model. The process is documented in the Dataloader.ipynb For uniform downsampling, different k_values for different data-subsets were used, as some values were giving us samples less than 16k (less than fps output) k = 100 (for all) k = 20 (for caterpillar)

About

Multimodal High-Resolution Point Cloud Reconstruction using a MAE model using Cross-Attention to fuse image tokens and point cloud tokens

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors