Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for text spotting code #500

Merged
merged 30 commits into from
Sep 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
90f7844
test install dptext_detr
rwood-97 Sep 4, 2024
8e00488
use detectron2 tests to set up install detectron2
rwood-97 Sep 4, 2024
3ecc6d8
fix -e
rwood-97 Sep 4, 2024
25be652
only test for python 3.9
rwood-97 Sep 4, 2024
4b74728
add initial tests for all 3 pipelines
rwood-97 Sep 4, 2024
02f0e7f
fix installs
rwood-97 Sep 4, 2024
92e579a
fix path to sample files
rwood-97 Sep 4, 2024
06255aa
fix installation issues (hopefully)
rwood-97 Sep 4, 2024
cb5a964
fix
rwood-97 Sep 4, 2024
a991783
force clang for install
rwood-97 Sep 4, 2024
e026d1b
try force reinstall
rwood-97 Sep 4, 2024
1a021bf
typo
rwood-97 Sep 4, 2024
7f7a590
test install deepsolo first
rwood-97 Sep 5, 2024
0893724
try excluding adelaidet cache
rwood-97 Sep 5, 2024
12b1729
force numpy version to <2.0.0
rwood-97 Sep 5, 2024
38bc844
force numpy to <2.0.0
rwood-97 Sep 5, 2024
fe999cf
force numpy to numpy==1.26.4
rwood-97 Sep 5, 2024
324d0fc
remove caching
rwood-97 Sep 5, 2024
58b9d52
Merge branch '436-geopandas' into test_text_spotting
rwood-97 Sep 5, 2024
1e5b493
add tests for 3 runners
rwood-97 Sep 5, 2024
4f26fac
print paths (for debugging)
rwood-97 Sep 6, 2024
b14ac62
turn on printing
rwood-97 Sep 6, 2024
45865c8
use cloned path if running with GH actions
rwood-97 Sep 6, 2024
52b6c97
add to DeepSolo and MapText files too
rwood-97 Sep 6, 2024
a9f54d9
speed up testing
rwood-97 Sep 6, 2024
0e2bc76
fix tmp path factory
rwood-97 Sep 6, 2024
d89eeaf
force reinstall for deepsolo/maptextpiepline
rwood-97 Sep 9, 2024
714f472
mock model response
rwood-97 Sep 9, 2024
b2033ec
Merge branch 'main' into test_text_spotting
rwood-97 Sep 12, 2024
ea06538
add docs, update changelog
rwood-97 Sep 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions .github/workflows/mr_ci_text_spotting.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
name: Units Tests - Text Spotting

on: [push]

# Run linter with github actions for quick feedbacks.
jobs:

macos_tests:
runs-on: macos-latest
# run on PRs, or commits to facebookresearch (not internal)
strategy:
fail-fast: false
matrix:
torch: ["1.13.1", "2.2.2"]
include:
- torch: "1.13.1"
torchvision: "0.14.1"
- torch: "2.2.2"
torchvision: "0.17.2"

env:
# point datasets to ~/.torch so it's cached by CI
DETECTRON2_DATASETS: ~/.torch/datasets
steps:
- name: Checkout
uses: actions/checkout@v2

- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9

- name: Update pip
run: |
python -m ensurepip
python -m pip install --upgrade pip

- name: Install dependencies
run: |
python -m pip install -U pip
python -m pip install wheel ninja opencv-python-headless onnx pytest-xdist
python -m pip install numpy==1.26.4
python -m pip install torch==${{matrix.torch}} torchvision==${{matrix.torchvision}} -f https://download.pytorch.org/whl/torch_stable.html
# install from github to get latest; install iopath first since fvcore depends on it
python -m pip install -U 'git+https://github.com/facebookresearch/iopath'
python -m pip install -U 'git+https://github.com/facebookresearch/fvcore'
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
python collect_env.py

- name: Build and install
run: |
CC=clang CXX=clang++ python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
python -m detectron2.utils.collect_env
python -m pip install ".[dev]"

- name: Install DPText-DETR
run: |
git clone https://github.com/maps-as-data/DPText-DETR.git
python -m pip install 'git+https://github.com/maps-as-data/DPText-DETR.git' # Install DPText-DETR
python -m pip install numpy==1.26.4
wget https://huggingface.co/rwood-97/DPText_DETR_ArT_R_50_poly/resolve/main/art_final.pth

- name: Run DPText-DETR unittests
run: |
python -m pytest test_text_spotting/test_dptext_runner.py


- name: Install DeepSolo
run: |
git clone https://github.com/maps-as-data/DeepSolo.git
python -m pip install 'git+https://github.com/maps-as-data/DeepSolo.git' --force-reinstall --no-deps # Install DeepSolo
python -m pip install numpy==1.26.4
wget https://huggingface.co/rwood-97/DeepSolo_ic15_res50/resolve/main/ic15_res50_finetune_synth-tt-mlt-13-15-textocr.pth

- name: Run DeepSolo unittests
run: |
python -m pytest test_text_spotting/test_deepsolo_runner.py

- name: Install MapTextPipeline
run: |
git clone https://github.com/maps-as-data/MapTextPipeline.git
python -m pip install 'git+https://github.com/maps-as-data/MapTextPipeline.git' --force-reinstall --no-deps # Install MapTextPipeline
python -m pip install "numpy<2.0.0"
wget https://huggingface.co/rwood-97/MapTextPipeline_rumsey/resolve/main/rumsey-finetune.pth

- name: Run MapTextPipeline unittests
run: |
python -m pytest test_text_spotting/test_maptext_runner.py
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ _ADD NEW CHANGES HERE_
- All file loading methods now support `pathlib.Path` and `gpd.GeoDataFrame` objects as input ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- Loading of dataframes from GeoJSON files now supported in many file loading methods (e.g. `add_metadata`, `Annotator.__init__`, `AnnotationsLoader.load`, etc.) ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- `load_frames.py` added to `mapreader.utils`. This has functions for loading from various file formats (e.g. CSV, Excel, GeoJSON, etc.) and converting to GeoDataFrames ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- Added tests for text spotting code ([#500](https://github.com/maps-as-data/MapReader/pull/500))

### Changed

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Running tests
=============

To run the tests for MapReader, you will need to have installed the **dev dependencies** as described above.
To run the tests for MapReader, you will need to have installed the **dev dependencies** (as described :doc:`here </getting-started/installation-instructions/2-install-mapreader>`.

Also, if you have followed the "Install from PyPI" instructions, you will need to clone the MapReader repository to access the tests. i.e.:
.. note:: If you have followed the "Install from PyPI" instructions, you will also need to clone the MapReader repository to access the tests. i.e.:

.. code-block:: bash

Expand All @@ -18,3 +18,44 @@ You can then run the tests using from the root of the MapReader directory using
python -m pytest -v

If all tests pass, this means that MapReader has been installed and is working as expected.

Testing text spotting
---------------------

The tests for the text spotting code are separated from the main tests due to dependency conflicts.

You will only be able to run the text spotting tests for the text spotting framework (DPTextDETR, DeepSolo or MapTextPipeline) you have installed.

For DPTextDETR, use the following commands:

.. code-block:: bash

cd path/to/MapReader # change this to your path, e.g. cd ~/MapReader
conda activate mapreader
export ADET_PATH=path/to/DPTextDETR # change this to the path where you have saved the DPTextDETR repository
wget https://huggingface.co/rwood-97/DPText_DETR_ArT_R_50_poly/resolve/main/art_final.pth # download the model weights
python -m pytest -v tests_text_spotting/test_dptext_runner.py


For DeepSolo:

.. code-block:: bash

cd path/to/MapReader # change this to your path, e.g. cd ~/MapReader
conda activate mapreader
export ADET_PATH=path/to/DeepSolo # change this to the path where you have saved the DeepSolo repository
wget https://huggingface.co/rwood-97/DeepSolo_ic15_res50/resolve/main/ic15_res50_finetune_synth-tt-mlt-13-15-textocr.pth # download the model weights
python -m pytest -v tests_text_spotting/test_deepsolo_runner.py

For MapTextPipeline:

.. code-block:: bash

cd path/to/MapReader # change this to your path, e.g. cd ~/MapReader
conda activate mapreader
export ADET_PATH=path/to/MapTextPipeline # change this to the path where you have saved the MapTextPipeline repository
wget https://huggingface.co/rwood-97/MapTextPipeline_rumsey/resolve/main/rumsey-finetune.pth # download the model weights
python -m pytest -v tests_text_spotting/test_maptext_runner.py


If all tests pass, this means that the text spotting framework has been installed and is working as expected.
228 changes: 228 additions & 0 deletions test_text_spotting/test_deepsolo_runner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
from __future__ import annotations

import os
import pathlib
import pickle

import adet
import geopandas as gpd
import pandas as pd
import pytest
from detectron2.engine import DefaultPredictor
from detectron2.structures.instances import Instances

from mapreader import DeepSoloRunner
from mapreader.load import MapImages

print(adet.__version__)

# use cloned DeepSolo path if running in github actions
ADET_PATH = (
pathlib.Path("./DeepSolo/").resolve()
if os.getenv("GITHUB_ACTIONS") == "true"
else pathlib.Path(os.getenv("ADET_PATH")).resolve()
)


@pytest.fixture
def sample_dir():
return pathlib.Path(__file__).resolve().parent.parent / "tests" / "sample_files"


@pytest.fixture
def init_dataframes(sample_dir, tmp_path):
"""Initializes MapImages object (with metadata from csv and patches) and creates parent and patch dataframes.
Returns
-------
tuple
path to parent and patch dataframes
"""
maps = MapImages(f"{sample_dir}/mapreader_text.png")
maps.add_metadata(f"{sample_dir}/mapreader_text_metadata.csv")
maps.patchify_all(patch_size=800, path_save=tmp_path)
maps.check_georeferencing()
parent_df, patch_df = maps.convert_images()
return parent_df, patch_df


@pytest.fixture(scope="function")
def mock_response(monkeypatch, sample_dir):
def mock_pred(self, *args, **kwargs):
with open(f"{sample_dir}/patch-0-0-800-40-deepsolo-pred.pkl", "rb") as f:
outputs = pickle.load(f)
return outputs

monkeypatch.setattr(DefaultPredictor, "__call__", mock_pred)


@pytest.fixture
def init_runner(init_dataframes):
parent_df, patch_df = init_dataframes
runner = DeepSoloRunner(
patch_df,
parent_df=parent_df,
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
return runner


@pytest.fixture
def runner_run_all(init_runner, mock_response):
runner = init_runner
_ = runner.run_all()
return runner


def test_deepsolo_init(init_dataframes):
parent_df, patch_df = init_dataframes
runner = DeepSoloRunner(
patch_df,
parent_df=parent_df,
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
assert isinstance(runner, DeepSoloRunner)
assert isinstance(runner.predictor, DefaultPredictor)
assert isinstance(runner.parent_df.iloc[0]["coordinates"], tuple)
assert isinstance(runner.patch_df.iloc[0]["coordinates"], tuple)


def test_deepsolo_init_str(init_dataframes, tmp_path):
parent_df, patch_df = init_dataframes
parent_df = parent_df.to_csv(f"{tmp_path}/parent_df.csv")
patch_df = patch_df.to_csv(f"{tmp_path}/patch_df.csv")
runner = DeepSoloRunner(
f"{tmp_path}/patch_df.csv",
parent_df=f"{tmp_path}/parent_df.csv",
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
assert isinstance(runner, DeepSoloRunner)
assert isinstance(runner.predictor, DefaultPredictor)
assert isinstance(runner.parent_df.iloc[0]["coordinates"], tuple)
assert isinstance(runner.patch_df.iloc[0]["coordinates"], tuple)


def test_deepsolo_init_pathlib(init_dataframes, tmp_path):
parent_df, patch_df = init_dataframes
parent_df = parent_df.to_csv(f"{tmp_path}/parent_df.csv")
patch_df = patch_df.to_csv(f"{tmp_path}/patch_df.csv")
runner = DeepSoloRunner(
pathlib.Path(f"{tmp_path}/patch_df.csv"),
parent_df=pathlib.Path(f"{tmp_path}/parent_df.csv"),
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
assert isinstance(runner, DeepSoloRunner)
assert isinstance(runner.predictor, DefaultPredictor)
assert isinstance(runner.parent_df.iloc[0]["coordinates"], tuple)
assert isinstance(runner.patch_df.iloc[0]["coordinates"], tuple)


def test_deepsolo_init_tsv(init_dataframes, tmp_path):
parent_df, patch_df = init_dataframes
parent_df = parent_df.to_csv(f"{tmp_path}/parent_df.tsv", sep="\t")
patch_df = patch_df.to_csv(f"{tmp_path}/patch_df.tsv", sep="\t")
runner = DeepSoloRunner(
f"{tmp_path}/patch_df.tsv",
parent_df=f"{tmp_path}/parent_df.tsv",
delimiter="\t",
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
assert isinstance(runner, DeepSoloRunner)
assert isinstance(runner.predictor, DefaultPredictor)
assert isinstance(runner.parent_df.iloc[0]["coordinates"], tuple)
assert isinstance(runner.patch_df.iloc[0]["coordinates"], tuple)


def test_deepsolo_run_all(init_runner, mock_response):
runner = init_runner
# dict
out = runner.run_all()
assert isinstance(out, dict)
assert "patch-0-0-800-40-#mapreader_text.png#.png" in out.keys()
assert isinstance(out["patch-0-0-800-40-#mapreader_text.png#.png"], list)
# dataframe
out = runner._dict_to_dataframe(runner.patch_predictions, geo=False, parent=False)
assert isinstance(out, pd.DataFrame)
assert set(out.columns) == set(["image_id", "geometry", "text", "score"])
assert "patch-0-0-800-40-#mapreader_text.png#.png" in out["image_id"].values


def test_deepsolo_convert_to_parent(runner_run_all, mock_response):
runner = runner_run_all
# dict
out = runner.convert_to_parent_pixel_bounds()
assert isinstance(out, dict)
assert "mapreader_text.png" in out.keys()
assert isinstance(out["mapreader_text.png"], list)
# dataframe
out = runner._dict_to_dataframe(runner.parent_predictions, geo=False, parent=True)
assert isinstance(out, pd.DataFrame)
assert set(out.columns) == set(
["image_id", "patch_id", "geometry", "text", "score"]
)
assert "mapreader_text.png" in out["image_id"].values


def test_deepsolo_convert_to_parent_coords(runner_run_all, mock_response):
runner = runner_run_all
# dict
out = runner.convert_to_coords()
assert isinstance(out, dict)
assert "mapreader_text.png" in out.keys()
assert isinstance(out["mapreader_text.png"], list)
# dataframe
out = runner._dict_to_dataframe(runner.geo_predictions, geo=True, parent=True)
assert isinstance(out, gpd.GeoDataFrame)
assert set(out.columns) == set(
["image_id", "patch_id", "geometry", "crs", "text", "score"]
)
assert "mapreader_text.png" in out["image_id"].values
assert out.crs == runner.parent_df.crs


def test_deepsolo_deduplicate(sample_dir, tmp_path, mock_response):
maps = MapImages(f"{sample_dir}/mapreader_text.png")
maps.add_metadata(f"{sample_dir}/mapreader_text_metadata.csv")
maps.patchify_all(patch_size=800, path_save=tmp_path, overlap=0.5)
maps.check_georeferencing()
parent_df, patch_df = maps.convert_images()
runner = DeepSoloRunner(
patch_df,
parent_df=parent_df,
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
_ = runner.run_all()
out = runner.convert_to_parent_pixel_bounds(deduplicate=False)
len_before = len(out["mapreader_text.png"])
runner.parent_predictions = {}
out_07 = runner.convert_to_parent_pixel_bounds(deduplicate=True)
len_07 = len(out_07["mapreader_text.png"])
print(len_before, len_07)
assert len_before >= len_07
runner.parent_predictions = {}
out_05 = runner.convert_to_parent_pixel_bounds(deduplicate=True, min_ioa=0.5)
len_05 = len(out_05["mapreader_text.png"])
print(len_before, len_05)
assert len_before >= len_05
assert len_07 >= len_05


def test_deepsolo_run_on_image(init_runner, mock_response):
runner = init_runner
out = runner.run_on_image(
runner.patch_df.iloc[0]["image_path"], return_outputs=True
)
assert isinstance(out, dict)
assert "instances" in out.keys()
assert isinstance(out["instances"], Instances)


def test_deepsolo_save_to_geojson(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
_ = runner.convert_to_coords()
runner.save_to_geojson(f"{tmp_path}/text.geojson")
assert os.path.exists(f"{tmp_path}/text.geojson")
gdf = gpd.read_file(f"{tmp_path}/text.geojson")
assert isinstance(gdf, gpd.GeoDataFrame)
assert set(gdf.columns) == set(
["image_id", "patch_id", "geometry", "crs", "text", "score"]
)
Loading
Loading