Project Webpage: https://weitaikang.github.io/Intent3D-webpage/
Paper: https://arxiv.org/abs/2405.18295
We introduce 3D intention grounding (right), a new task for detecting the object of interest using a 3D bounding box in a 3D scene, guided by human intention expressed in text (e.g., "I want something to support my back to relieve the pressure"). In contrast, the existing 3D visual grounding (left) relies on human reasoning and references for detection. The illustration clearly distinguishes that observation and reasoning are manually executed by human (left) and automated by AI (right).- (1) Install environment with
environment.yml
file:conda env create -f environment.yml --name Intent3D
- or you can install manually:
conda create -n Intent3D python=3.7 conda activate Intent3D conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia pip install numpy ipython psutil traitlets transformers termcolor ipdb scipy tensorboardX h5py wandb plyfile tabulate
- or you can install manually:
- (2) Install spacy for text parsing
pip install spacy # 3.3.0 pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0.tar.gz
- (3) Compile pointnet++
cd ~/Intent3D sh init.sh
The final required files are as follows:
├── [DATA_ROOT]
│ ├── [1] train_v3scans.pkl # Packaged ScanNet training set
│ ├── [2] val_v3scans.pkl # Packaged ScanNet validation set
│ ├── [3] intention_sentence/ # annotation
│ ├── [4] group_free_pred_bboxes/ # detected boxes
│ ├── [5] gf_detector_l6o256.pth # pointnet++ checkpoint
- [1] [2] Prepare ScanNet Point Clouds Data
- 1) Download ScanNet v2 data. Follow the ScanNet instructions to apply for dataset permission, and you will get the official download script
download-scannet.py
. Then use the following command to download the necessary files:wherepython2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.ply python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.labels.ply python2 download-scannet.py -o [SCANNET_PATH] --type .aggregation.json python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.0.010000.segs.json python2 download-scannet.py -o [SCANNET_PATH] --type .txt
[SCANNET_PATH]
is the output folder. The scannet dataset structure should look like below:├── [SCANNET_PATH] │ ├── scans │ │ ├── scene0000_00 │ │ │ ├── scene0000_00.txt │ │ │ ├── scene0000_00.aggregation.json │ │ │ ├── scene0000_00_vh_clean_2.ply │ │ │ ├── scene0000_00_vh_clean_2.labels.ply │ │ │ ├── scene0000_00_vh_clean_2.0.010000.segs.json │ │ ├── scene.......
- 2) Package the above files into two .pkl files(
train_v3scans.pkl
andval_v3scans.pkl
):python Pack_scan_files.py --scannet_data [SCANNET_PATH] --data_root [DATA_ROOT]
- 1) Download ScanNet v2 data. Follow the ScanNet instructions to apply for dataset permission, and you will get the official download script
- [3] intention_sentence: Download Intent3D annotations from HERE.
- [4] group_free_pred_bboxes: Download object detector's outputs.
- [5] gf_detector_l6o256.pth: Download PointNet++ checkpoint.
- Please specify the paths of
--data_root
,--log_dir
,--tensorboard
, and--pp_checkpoint
in thetrain_scanintend
script first. We use four A6000 GPUs for training with a batch size of 24 by default.sh scripts/train_scanintend.sh
- To evaluate on the validation set, please specify the paths of
--data_root
,--log_dir
,--tensorboard
,--pp_checkpoint
, andcheckpoint_path
in thetest_scanintend
script first.sh scripts/test_scanintend.sh
- Change
--eval_val
to--eval_test
for evaluation on the test set.
We are quite grateful for BUTD-DETR and EDA, which are works on 3D Visual Grounding. Our codebase is inherited from EDA, which is inherited from BUTD-DETR. If you come across issues that seem unrelated to 3D Intention Grounding task itself, checking the corresponding parts of EDA or BUTD-DETR might sometimes be helpful. 🤔
If you find our work useful in your research, please consider citing:
@article{kang2024intent3d,
title={Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention},
author={Kang, Weitai and Qu, Mengxue and Kini, Jyoti and Wei, Yunchao and Shah, Mubarak and Yan, Yan},
journal={arXiv preprint arXiv:2405.18295},
year={2024}
}
If you have any question about this project, please feel free to contact Weitai Kang: k13711752197[AT]gmail.com