This repository contains the implementation of the paper:
LANISTR: Multimodal Learning from Structured and Unstructured Data [Paper]
Sayna Ebrahimi, Sercan Arik, Yihe Dong, Tomas Pfister.
Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datasets, MIMIC-IV (from healthcare) and Amazon Product Review (from retail), LANISTR demonstrates remarkable improvements, 6.6% (in AUROC) and 14% (in accuracy) when fine-tuned with 0.1% and 0.01% of labeled data, respectively, compared to the state-of-the-art alternatives. Notably, these improvements are observed even with very high ratio of samples (35.7% and 99.8% respectively) not containing all modalities, underlining the robustness of LANISTR to practical missing modality challenge.
conda create -n lanistr python=3.8 -y
conda activate lanistr
pip install -e .
- For MIMIC-IV v2.2 dataset, please follow the instructions to obtain access. Note that, prior to requesting access to MIMIC, you must become a credentialed user on PhysioNet where the MIMIC data is hosted (see instructions).
- Once you obtain access, log into your PhysioNet account and visit
MIMIC-IV project page,
MIMIC-CXR project page and
MIMIC-ED project page. Find
the “Files” section in the project description and download the data after
signing the user agreement. Place all the data under
physionet.org
directory as shown below. - Next, follow
these instructions
to extract and preprocess the data into
./lanistr/data/MIMIC-IV-V2.2/
directory. Note that two sub-directories will be generated in./lanistr/data/MIMIC-IV-V2.2/
named asroot
andin-hospital-mortality
. This path (./lanistr/data/MIMIC-IV-V2.2/
) is called aspreprocessed_data_dir
in the config files in our code. - Download
discretizer_config.json
and place it in
./lanistr/data/MIMIC-IV-V2.2/discretizer_config.json
.
- For
Amazon Product Review Dataset,
please use the provided download script to download the data from the
original servers upon agreeing to their user agreements. Make sure you are in
./lanistr/
subdirectory.
bash scripts/download_amazon.sh
Once downloaded, the data directory in lanistr code should look like as below for
./lanistr/data/APR2018
lanistr/
configs/
datasets/
model/
Figures/
scripts/
utils/
......
data/
|-- MIMIC-IV-V2.2
| -- discretizer_config.json
| -- in-hospital-mortality
| -- physionet.org
| -- files
| -- mimic-cxr
| -- mimic-cxr-jpg
| -- mimic-iv-ed
| -- mimic-iv-note
| -- mimiciv
| -- robots.txt
|-- APR2018
| -- All_Beauty
| -- images
| -- All_Beauty.json.gz
| -- meta_All_Beauty.json.gz
| -- AMAZON_FASHION
| -- images
| -- AMAZON_FASHION.json.gz
| -- meta_AMAZON_FASHION.json.gz
| -- Office_Products
| -- images
| -- Office_Products.json.gz
| -- meta_Office_Products.json.gz
......
Experiments on MIMIC-IV dataset start by first pretraining the model followed by
finetuning on the labeled portion of the data. For both steps we have used
8xA100 with 40GB memory. For evaluation, we only use a single GPU. The following scripts run pretraining and fine-tuning experiments. Please make sure to run them from ./lanistr/
subdirectory.
bash scripts/mimic_pretrain.sh
bash scripts/mimic_finetune.sh
bash scripts/mimic_eval.sh
Experiments on APR2018 dataset start by first pretraining the model on the
Office category. We have finetuned the pretrained checkpoint on two distinct
categories of All_Beauty
and AMAZON_FASHION
. For all experiments we have
used 8xA100 with 40GB memory. For evaluation, we only use a single GPU. The following scripts run pretraining and fine-tuning experiments. Please make sure to run them from ./lanistr/
subdirectory.
bash scripts/amazon_pretrain.sh
For All_Beatuy and Fashion category use the following scripts, respectively.
bash scripts/amazon_finetune_beauty.sh
bash scripts/amazon_finetune_fashion.sh
bash scripts/amazon_eval_beauty.sh
bash scripts/amazon_eval_fashion.sh
Prepare your data: Organize your data into a CSV file where each column represents a different modality (image filenames, text, time series file paths, and tabular features). Ensure categorical and numerical tabular features are in separate columns.
Adapt the dataset class: Modify the dataset class located at ./dataset/amazon/load_data.py to correctly read and iterate through your CSV data. This might involve adjusting column names, data types, and loading logic.
Choose a finetuning config file: Select one of the provided finetuning
configuration files (e.g., ./lanistr/configs/mimic_finetune.yaml
).
Set finetune_initialize_from to random: In the chosen config file, locate the finetune_initialize_from parameter and set its value to `random. This will initialize LANISTR's architecture with random weights, except for the image encoder (initialized from a pretrained ViT on ImageNet), the text encoder (pretrained BERT), and the multimodal fusion encoder (pretrained FLAVA on ImageNet). The time series and TabNet encoders will still initialize randomly.
In all configuration files, you can enable or disable individual modalities using the following parameters:
# modalities
image: true # Set to false if you don't have image data
text: true # Set to false if you don't have text data
time: true # Set to false if you don't have time series data
tab: true # Set to false if you don't have tabular data
We used pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
If you find our code or paper helpful, please kindly cite LANISTR: ```BibTeX
@article{ebrahimi2023lanistr, title={LANISTR: Multimodal Learning from Structured and Unstructured Data}, author={Ebrahimi, Sayna and Arik, Sercan O and Dong, Yihe and Pfister, Tomas}, journal={arXiv preprint arXiv:2305.16556}, year={2023} } ```
See CONTRIBUTING.md
for details.
Apache 2.0; see LICENSE
for details.
This repo borrows some codes from MedFUse, pytorch_tabnet, Multivariate Time Series Transformer Framework, SimSiam. Thanks for their great works.
This is not an officially supported Google product.