Scientific papers typically organize contents in visual groups like text blocks or lines, and text within each group usually have the same semantics. We explore different approaches for injecting the group structure into the text classifiers, and build models that improves the accuracy or efficiency of the scientific text classification task.
After cloning the github repo, you can either install the vila
library or just install the dependencies:
git clone git@github.com:allenai/VILA.git
cd VILA
conda create -n vila python=3.6
pip install -e . # Install the `vila` library
pip install -r requirements.txt # Only install the dependencies
We tested the code and trained the models using Python≥3.6
, PyTorch==1.7.1
, and transformers==4.4.2
.
@article{Shen2021IncorporatingVL,
title={Incorporating Visual Layout Structures for Scientific Text Classification},
author={Zejiang Shen and Kyle Lo and Lucy Lu Wang and Bailey Kuehl and Daniel S. Weld and Doug Downey},
journal={ArXiv},
year={2021},
volume={abs/2106.00676},
url={https://arxiv.org/abs/2106.00676}
}