Project Website , Paper (arXiv)
This repository contains the code for the paper "One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation". We provide a dockerized environment to run the code or you can run it locally.
In summary we open-source:
- The OneMap mapping and navigation code
- The evaluation code for single- and multi-object navigation
- The multi-object navigation dataset and benchmark
- The multi-object navigation dataset generation code, such that you can generate your own datasets
The capability to efficiently search for objects in complex environments is fundamental for many real-world robot
applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation
methods that allow a robot to search for an arbitrary object without prior training. However, these
zero-shot methods have so far treated the environment as unknown for each consecutive query.
In this paper we introduce a new benchmark for zero-shot multi-object navigation, allowing the robot to leverage
information gathered from previous searches to more efficiently find new objects. To address this problem we build a
reusable open-vocabulary feature map tailored for real-time object search. We further propose a probabilistic-semantic
map update that mitigates common sources of errors in semantic feature extraction and leverage this semantic uncertainty
for informed multi-object exploration. We evaluate our method on a set of object navigation tasks in both simulation
as well as with a real robot, running in real-time on a Jetson Orin AGX. We demonstrate that it outperforms existing
state-of-the-art approaches both on single and multi-object navigation tasks.
You will need to have Docker installed on your system. Follow the official instructions to install. You will also need to have the nvidia-container-toolkit installed and configured as docker runtime on your system.
# https
git clone https://github.com/KTH-RPL/OneMap.git
# or ssh
git clone git@github.com:KTH-RPL/OneMap.git
cd OneMap/
The docker image build process will build habitat-sim and download model weights. You can choose to let the container
download the habitat scenes during build, or if you have them already downloaded, you can set HM3D=LOCAL
and provide
the absolute HM3D_PATH
to the versioned_data
directory on your machine in the .env
file in the root of the repository.
If you want the container to download the scenes for you, set HM3D=FULL
in the .env
file and provide your
Matterport credentials. You can get access for Matterport for free here.
You will not need to provide a HM3D_PATH
then.
Having configured the .env
file, you can build the docker image in the root of the repository with:
docker compose build
The build will take a while as habitat-sim
is built from source. You can launch the docker container with:
bash run_docker.sh
and open a new terminal in the container with:
docker exec -it onemap-onemap-1 bash
# https
git clone https://github.com/KTH-RPL/OneMap.git
# or ssh
git clone git@github.com:KTH-RPL/OneMap.git
cd OneMap/
python3 -m pip install gdown torch torchvision torchaudio meson
python3 -m pip install -r requirements.txt
Manually install newer timm
version:
python3 -m pip install --upgrade timm>=1.0.7
YOLOV7:
git clone https://github.com/WongKinYiu/yolov7
Build planning utilities:
python3 -m pip install ./planning_cpp/
mkdir -p weights/
SED extracted weights:
gdown 1D_RE4lvA-CiwrP75wsL8Iu1a6NrtrP9T -O weights/clip.pth
YOLOV7 weights and MobileSAM weights:
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt -O weights/yolov7-e6e.pt
wget https://github.com/ChaoningZhang/MobileSAM/raw/refs/heads/master/weights/mobile_sam.pt -O weights/mobile_sam.pt
You can run the code on an example, visualized in rerun.io with:
You will need to have rerun.io installed on the host for visualization. Ensure the docker is running and you are in the container as described in the Docker setup. Then launch the rerun viewer on the host (not inside the docker) with:
rerun
and launch the example in the container with:
python3 habitat_test.py --config/mon/base_conf_sim.yaml
Open the rerun viewer and example from the root of the repository with:
rerun
python3 habitat_test.py --config/mon/base_conf_sim.yaml
You can reproduce the evaluation results from the paper for single- and multi-object navigation.
python3 eval_habitat.py --config config/mon/eval_conf.yaml
This will run the evaluation and save the results in the results/
directory. You can read the results with:
python3 read_results.py --config config/mon/eval_conf.yaml
python3 eval_habitat_multi.py --config config/mon/eval_multi_conf.yaml
This will run the evaluation and save the results in the results_multi/
directory. You can read the results with:
python3 read_results_multi.py --config config/mon/eval_multi_conf.yaml
While we provide the generated dataset for the evaluation of multi-object navigation, we also release the code to generate the datasets with varying parameters. You can generate the dataset with
python3 eval/dataset_utils/gen_multiobject_dataset.py
and change the parameters such as number of objects per episode in the corresponding file.
If you use this code in your research, please cite our paper:
@misc{busch2024mapallrealtimeopenvocabulary,
title={One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation},
author={Finn Lukas Busch and Timon Homberger and Jesús Ortega-Peimbert and Quantao Yang and Olov Andersson},
year={2024},
eprint={2409.11764},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2409.11764},
}