Building:
Coverage:
- convert tex to either html
- parse html to sentences
- GUI used to visualize and label sentences
Install Latexml, instructions can be found here
Install python dependency pip3 install -r requirements.txt
Create a file meta.json
inside the tex folder which indicates the entry point which has the following format:
{
"tex_filename": "main.tex"
}
Put the unzipped tex file folder under data/tex_files
Then go to the root dir of the project and run python3 convert.py
Once completed, the output html files will be placed in data/html_files
Go to the root dir of the project, run ./run_tools.sh
Then the labeling tool will automatically open in default browser (chrome recommended).
where search will search all the sentences containing the symbol of choice within the document of choice.
Here is an example of search results, you can edit the labels as strings. To save the changes in json file. "Save to overall" will save it to the overall json file with many documents in it. "Save to separate" will save it to a separate file named as data/outputs/[document_name]_[symbol_expression].json
. "Save to both" will literally do both as the same time.