Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention (ICLR2025)

Project Webpage: https://weitaikang.github.io/Intent3D-webpage/

We introduce 3D intention grounding (right), a new task for detecting the object of interest using a 3D bounding box in a 3D scene, guided by human intention expressed in text (e.g., "I want something to support my back to relieve the pressure"). In contrast, the existing 3D visual grounding (left) relies on human reasoning and references for detection. The illustration clearly distinguishes that observation and reasoning are manually executed by human (left) and automated by AI (right).

0. Installation

(1) Install environment with environment.yml file:

conda env create -f environment.yml --name Intent3D

or you can install manually:

conda create -n Intent3D python=3.7
conda activate Intent3D
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
pip install numpy ipython psutil traitlets transformers termcolor ipdb scipy tensorboardX h5py wandb plyfile tabulate

(2) Install spacy for text parsing

pip install spacy
# 3.3.0
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0.tar.gz

(3) Compile pointnet++
```
cd ~/Intent3D
sh init.sh
```

1. Data preparation

The final required files are as follows:

├── [DATA_ROOT]
│	├── [1] train_v3scans.pkl # Packaged ScanNet training set
│	├── [2] val_v3scans.pkl   # Packaged ScanNet validation set
│	├── [3] intention_sentence/   # annotation
│	├── [4] group_free_pred_bboxes/  # detected boxes
│	├── [5] gf_detector_l6o256.pth   # pointnet++ checkpoint

[1] [2] Prepare ScanNet Point Clouds Data

1) Download ScanNet v2 data. Follow the ScanNet instructions to apply for dataset permission, and you will get the official download script download-scannet.py. Then use the following command to download the necessary files:

python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.ply
python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.labels.ply
python2 download-scannet.py -o [SCANNET_PATH] --type .aggregation.json
python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.0.010000.segs.json
python2 download-scannet.py -o [SCANNET_PATH] --type .txt

where [SCANNET_PATH] is the output folder. The scannet dataset structure should look like below:

├── [SCANNET_PATH]
│   ├── scans
│   │   ├── scene0000_00
│   │   │   ├── scene0000_00.txt
│   │   │   ├── scene0000_00.aggregation.json
│   │   │   ├── scene0000_00_vh_clean_2.ply
│   │   │   ├── scene0000_00_vh_clean_2.labels.ply
│   │   │   ├── scene0000_00_vh_clean_2.0.010000.segs.json
│   │   ├── scene.......

2) Package the above files into two .pkl files(train_v3scans.pkl and val_v3scans.pkl):
```
python Pack_scan_files.py --scannet_data [SCANNET_PATH] --data_root [DATA_ROOT]
```

[3] intention_sentence: Download Intent3D annotations from HERE.
[4] group_free_pred_bboxes: Download object detector's outputs.
[5] gf_detector_l6o256.pth: Download PointNet++ checkpoint.

2. Training

Please specify the paths of --data_root, --log_dir, --tensorboard, and --pp_checkpoint in the train_scanintend script first. We use four A6000 GPUs for training with a batch size of 24 by default.
```
sh scripts/train_scanintend.sh
```

3. Evaluation

To evaluate on the validation set, please specify the paths of --data_root, --log_dir, --tensorboard, --pp_checkpoint, and checkpoint_path in the test_scanintend script first.
```
sh scripts/test_scanintend.sh
```
Change --eval_val to --eval_test for evaluation on the test set.

4. Benchmark

5. Acknowledgements

We are quite grateful for BUTD-DETR and EDA, which are works on 3D Visual Grounding. Our codebase is inherited from EDA, which is inherited from BUTD-DETR. If you come across issues that seem unrelated to 3D Intention Grounding task itself, checking the corresponding parts of EDA or BUTD-DETR might sometimes be helpful. 🤔

6. Citation

If you find our work useful in your research, please consider citing:

@article{kang2024intent3d,
  title={Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention},
  author={Kang, Weitai and Qu, Mengxue and Kini, Jyoti and Wei, Yunchao and Shah, Mubarak and Yan, Yan},
  journal={arXiv preprint arXiv:2405.18295},
  year={2024}
}

7. Contact

If you have any question about this project, please feel free to contact Weitai Kang: k13711752197[AT]gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
models		models
pointnet2		pointnet2
scripts		scripts
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Pack_scan_files.py		Pack_scan_files.py
README.md		README.md
environment.yml		environment.yml
init.sh		init.sh
main_utils.py		main_utils.py
prepare_data.py		prepare_data.py
train_dist_mod.py		train_dist_mod.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention (ICLR2025)

0. Installation

1. Data preparation

2. Training

3. Evaluation

4. Benchmark

5. Acknowledgements

6. Citation

7. Contact

About

Releases

Packages

Languages

License

WeitaiKang/Intent3D

Folders and files

Latest commit

History

Repository files navigation

Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention (ICLR2025)

0. Installation

1. Data preparation

2. Training

3. Evaluation

4. Benchmark

5. Acknowledgements

6. Citation

7. Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages