Skip to content

Commit

Permalink
Minor fixes/address review comments
Browse files Browse the repository at this point in the history
  • Loading branch information
rwood-97 committed Dec 19, 2024
1 parent aaf8f7c commit 5ca580a
Show file tree
Hide file tree
Showing 6 changed files with 142 additions and 94 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ _Add new changes here_

## Added

- Added ablity to save and reload text predictions ([#536](https://github.com/maps-as-data/MapReader/pull/536)
- Added ability to save and reload text predictions ([#536](https://github.com/maps-as-data/MapReader/pull/536)
- Added minimal dataclasses for text predictions ([#536](https://github.com/maps-as-data/MapReader/pull/536)

## [v1.6.1](https://github.com/Living-with-machines/MapReader/releases/tag/v1.6.1) (2024-11-18)
Expand Down
24 changes: 12 additions & 12 deletions docs/source/using-mapreader/step-by-step-guide/6-spot-text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ As above, use the ``border_color``, ``text_color`` and ``figsize`` arguments to
Geo-reference
-------------

If you maps are georeferenced in your ``parent_df``, you can also convert the pixel bounds to georeferenced coordinates using the ``convert_to_coords`` method:
If you maps are georeferenced in your ``parent_df``, you can also convert the pixel coordinates to georeferenced coordinates using the ``convert_to_coords`` method:

.. code-block:: python
Expand Down Expand Up @@ -281,11 +281,11 @@ Refer to the `geopandas explore documentation <https://geopandas.org/en/stable/d
Saving
------

You can save your georeferenced predictions to a geojson file for loading into GIS software using the ``save_to_geojson`` method:
You can save your georeferenced predictions to a geojson file for loading into GIS software using the ``to_geojson`` method:

.. code-block:: python
my_runner.save_to_geojson("text_preds.geojson")
my_runner.to_geojson("text_preds.geojson")
This will save the predictions to a geojson file, with each text prediction as a separate feature.

Expand All @@ -294,19 +294,19 @@ If instead you would like to save just the centroid of this polygon, you can set

.. code-block:: python
my_runner.save_to_geojson("text_preds.geojson", centroid=True)
my_runner.to_geojson("text_preds.geojson", centroid=True)
This will save the centroid of the bounding box as the geometry column and create a "polygon" column containing the original polygon.

At any point, you can also save your patch, parent and georeferenced predictions to CSV files using the ``save_to_csv`` method:
At any point, you can also save your patch, parent and georeferenced predictions to CSV files using the ``to_csv`` method:

.. code-block:: python
my_runner.save_to_csv("my_preds/")
my_runner.to_csv("my_preds/")
This will create a folder called "my_preds" and save the patch, parent and georeferenced predictions to CSV files within it.

As above, you can use the ``centroid`` argument to save the centroid of the bounding box instead of the full polygon.
As above, you can use the ``centroid=True`` argument to save the centroid of the bounding box instead of the full polygon.


Loading
Expand All @@ -322,7 +322,7 @@ The ``load_geo_predictions`` method is used to load georeferenced predictions fr
my_runner.load_geo_predictions("text_preds.geojson")
Loading this fill will populate the patch, parent and georeferenced predictions in the runner.
Loading this will populate the patch, parent and georeferenced predictions in the runner.

The ``load_patch_predictions`` method is used to load patch predictions from a CSV file or pandas DataFrame.
To load a CSV file, you can use:
Expand All @@ -337,8 +337,8 @@ Or, to load a pandas DataFrame, you can use:
my_runner.load_patch_predictions(patch_preds_df)
This will populate the patch and parent predictions in the runner but not the georeferenced predictions (incase you do not have georefencing information).
If you do want to convert these to georeferenced predictions, you can use the ``convert_to_coords`` method as shown above.
This will populate the patch and parent predictions in the runner but not the georeferenced predictions (in case you do not have georefencing information).
If you do want to convert your text predictions from pixel coordinates to geospatial coordinates, you can use the ``convert_to_coords`` method as shown above.


Search predictions
Expand Down Expand Up @@ -409,11 +409,11 @@ You can also pass in a dictionary of ``style_kwargs`` to customize the appearanc
Save search results
~~~~~~~~~~~~~~~~~~~

If your maps are georeferenced, you can also save your search results using the ``save_search_results_to_geojson`` method:
If your maps are georeferenced, you can also save your search results using the ``search_results_to_geojson`` method:

.. code-block:: python
my_runner.save_search_results_to_geojson("search_results.geojson")
my_runner.search_results_to_geojson("search_results.geojson")
This will save the search results to a geojson file, with each search result as a separate feature which can be loaded into GIS software for further analysis/exploration.

Expand Down
114 changes: 81 additions & 33 deletions mapreader/spot_text/runner_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,27 @@ def save_to_geojson(
self,
path_save: str | pathlib.Path,
centroid: bool = False,
) -> None:
"""
Save the georeferenced predictions to a GeoJSON file.
Parameters
----------
path_save : str | pathlib.Path, optional
Path to save the GeoJSON file
centroid : bool, optional
Whether to convert the polygons to centroids, by default False.
NOTE: The original polygon will still be saved as a separate column
"""
print(
"[WARNING] This method is deprecated and will soon be removed. Use `to_geojson` instead."
)
self.to_geojson(path_save, centroid)

def to_geojson(
self,
path_save: str | pathlib.Path,
centroid: bool = False,
) -> None:
"""Save the georeferenced predictions to a GeoJSON file.
Expand All @@ -506,7 +527,7 @@ def save_to_geojson(

geo_df.to_file(path_save, driver="GeoJSON", engine="pyogrio")

def save_to_csv(
def to_csv(
self,
path_save: str | pathlib.Path,
centroid: bool = False,
Expand Down Expand Up @@ -858,7 +879,7 @@ def _post_process(self, image_id, ctrl_pnts, scores, recs, bd_pnts):
PatchPrediction(pixel_geometry=polygon, score=score, text=text)
)

def search_preds(
def search_predictions(
self, search_text: str, ignore_case: bool = True, return_dataframe: bool = False
) -> dict | pd.DataFrame:
"""Search the predictions for specific text. Accepts regex.
Expand Down Expand Up @@ -1044,36 +1065,63 @@ def explore_search_results(
style_kwds=style_kwargs,
)

def save_search_results_to_geojson(
self,
path_save: str | pathlib.Path,
centroid: bool = False,
) -> None:
"""Convert the search results to georeferenced search results and save them to a GeoJSON file.
Parameters
----------
path_save : str | pathlib.Path
The path to save the GeoJSON file.
centroid : bool, optional
Whether to save the centroid of the polygons as the geometry column, by default False.
Note: The original polygon will stil be saved as a separate column.
Raises
------
ValueError
If no search results are found.
"""
if self.search_results == {}:
raise ValueError("[ERROR] No results to save!")

geo_search_results = self._get_geo_search_results()
geo_df = self._dict_to_dataframe(geo_search_results)

if centroid:
geo_df["polygon"] = geo_df["geometry"].to_wkt()
geo_df["geometry"] = (
geo_df["geometry"].to_crs("27700").centroid.to_crs(geo_df.crs)
)
def save_search_results_to_geojson(
self,
path_save: str | pathlib.Path,
centroid: bool = False,
) -> None:
"""Convert the search results to georeferenced search results and save them to a GeoJSON file.
Parameters
----------
path_save : str | pathlib.Path
The path to save the GeoJSON file.
centroid : bool, optional
Whether to save the centroid of the polygons as the geometry column, by default False.
Note: The original polygon will stil be saved as a separate column.
Raises
------
ValueError
If no search results are found.
"""
print(
"[WARNING] This method is deprecated and will soon be removed. Use `search_results_to_geojson` instead."
)
self.search_results_to_geojson(path_save, centroid)


def search_results_to_geojson(
self,
path_save: str | pathlib.Path,
centroid: bool = False,
) -> None:
"""Convert the search results to georeferenced search results and save them to a GeoJSON file.
Parameters
----------
path_save : str | pathlib.Path
The path to save the GeoJSON file.
centroid : bool, optional
Whether to save the centroid of the polygons as the geometry column, by default False.
Note: The original polygon will stil be saved as a separate column.
Raises
------
ValueError
If no search results are found.
"""
if self.search_results == {}:
raise ValueError("[ERROR] No results to save!")

geo_search_results = self._get_geo_search_results()
geo_df = self._dict_to_dataframe(geo_search_results)

if centroid:
geo_df["polygon"] = geo_df["geometry"].to_wkt()
geo_df["geometry"] = (
geo_df["geometry"].to_crs("27700").centroid.to_crs(geo_df.crs)
)

geo_df.to_file(path_save, driver="GeoJSON", engine="pyogrio")
geo_df.to_file(path_save, driver="GeoJSON", engine="pyogrio")
50 changes: 25 additions & 25 deletions tests/test_text_spotting/test_deepsolo_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def init_dataframes(sample_dir, tmp_path):
"""
maps = MapImages(f"{sample_dir}/mapreader_text.png")
maps.add_metadata(f"{sample_dir}/mapreader_text_metadata.csv")
maps.patchify_all(patch_size=800, path_save=tmp_path)
maps.patchify_all(patch_size=800, path_=tmp_path)
maps.check_georeferencing()
assert maps.georeferenced
parent_df, patch_df = maps.convert_images()
Expand Down Expand Up @@ -279,7 +279,7 @@ def test_deepsolo_convert_to_parent_coords(runner_run_all, mock_response):
def test_deepsolo_deduplicate(sample_dir, tmp_path, mock_response):
maps = MapImages(f"{sample_dir}/mapreader_text.png")
maps.add_metadata(f"{sample_dir}/mapreader_text_metadata.csv")
maps.patchify_all(patch_size=800, path_save=tmp_path, overlap=0.5)
maps.patchify_all(patch_size=800, path_=tmp_path, overlap=0.5)
maps.check_georeferencing()
parent_df, patch_df = maps.convert_images()
runner = DeepSoloRunner(
Expand Down Expand Up @@ -313,10 +313,10 @@ def test_deepsolo_run_on_image(init_runner, mock_response):
assert isinstance(out["instances"], Instances)


def test_deepsolo_save_to_geojson(runner_run_all, tmp_path, mock_response):
def test_deepsolo_to_geojson(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
_ = runner.convert_to_coords()
runner.save_to_geojson(f"{tmp_path}/text.geojson")
runner.to_geojson(f"{tmp_path}/text.geojson")
assert os.path.exists(f"{tmp_path}/text.geojson")
gdf = gpd.read_file(f"{tmp_path}/text.geojson")
assert isinstance(gdf, gpd.GeoDataFrame)
Expand All @@ -325,10 +325,10 @@ def test_deepsolo_save_to_geojson(runner_run_all, tmp_path, mock_response):
)


def test_deepsolo_save_to_geojson_centroid(runner_run_all, tmp_path, mock_response):
def test_deepsolo_to_geojson_centroid(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
_ = runner.convert_to_coords()
runner.save_to_geojson(f"{tmp_path}/text_centroid.geojson", centroid=True)
runner.to_geojson(f"{tmp_path}/text_centroid.geojson", centroid=True)
assert os.path.exists(f"{tmp_path}/text_centroid.geojson")
gdf_centroid = gpd.read_file(f"{tmp_path}/text_centroid.geojson")
assert isinstance(gdf_centroid, gpd.GeoDataFrame)
Expand All @@ -349,7 +349,7 @@ def test_deepsolo_save_to_geojson_centroid(runner_run_all, tmp_path, mock_respon
def test_deepsolo_load_geo_predictions(runner_run_all, tmp_path):
runner = runner_run_all
_ = runner.convert_to_coords()
runner.save_to_geojson(f"{tmp_path}/text.geojson")
runner.to_geojson(f"{tmp_path}/text.geojson")
runner.geo_predictions = {}
runner.load_geo_predictions(f"{tmp_path}/text.geojson")
assert len(runner.geo_predictions)
Expand All @@ -364,54 +364,54 @@ def test_deepsolo_load_geo_predictions_errors(runner_run_all, tmp_path):
runner.load_geo_predictions("fakefile.csv")


def test_deepsolo_save_to_csv_polygon(runner_run_all, tmp_path, mock_response):
def test_deepsolo_to_csv_polygon(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
# patch
runner.save_to_csv(tmp_path)
runner.to_csv(tmp_path)
assert os.path.exists(f"{tmp_path}/patch_predictions.csv")
# parent
_ = runner.convert_to_parent_pixel_bounds()
runner.save_to_csv(tmp_path)
runner.to_csv(tmp_path)
assert os.path.exists(f"{tmp_path}/patch_predictions.csv")
assert os.path.exists(f"{tmp_path}/parent_predictions.csv")
# geo
_ = runner.convert_to_coords()
runner.save_to_csv(tmp_path)
runner.to_csv(tmp_path)
assert os.path.exists(f"{tmp_path}/patch_predictions.csv")
assert os.path.exists(f"{tmp_path}/parent_predictions.csv")
assert os.path.exists(f"{tmp_path}/geo_predictions.csv")


def test_deepsolo_save_to_csv_centroid(runner_run_all, tmp_path, mock_response):
def test_deepsolo_to_csv_centroid(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
# patch
runner.save_to_csv(tmp_path, centroid=True)
runner.to_csv(tmp_path, centroid=True)
assert os.path.exists(f"{tmp_path}/patch_predictions.csv")
# parent
_ = runner.convert_to_parent_pixel_bounds()
runner.save_to_csv(tmp_path, centroid=True)
runner.to_csv(tmp_path, centroid=True)
assert os.path.exists(f"{tmp_path}/patch_predictions.csv")
assert os.path.exists(f"{tmp_path}/parent_predictions.csv")
# geo
_ = runner.convert_to_coords()
runner.save_to_csv(tmp_path, centroid=True)
runner.to_csv(tmp_path, centroid=True)
assert os.path.exists(f"{tmp_path}/patch_predictions.csv")
assert os.path.exists(f"{tmp_path}/parent_predictions.csv")
assert os.path.exists(f"{tmp_path}/geo_predictions.csv")


def test_deepsolo_save_to_csv_errors(runner_run_all, tmp_path, mock_response):
def test_deepsolo_to_csv_errors(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
runner.patch_predictions = {}
with pytest.raises(ValueError, match="No patch predictions found"):
runner.save_to_csv(tmp_path)
runner.to_csv(tmp_path)


def test_deepsolo_load_patch_predictions(runner_run_all, tmp_path):
runner = runner_run_all
_ = runner.convert_to_coords()
assert len(runner.geo_predictions) # this will be empty after reloading
runner.save_to_csv(tmp_path)
runner.to_csv(tmp_path)
runner.load_patch_predictions(f"{tmp_path}/patch_predictions.csv")
assert len(runner.patch_predictions)
assert len(runner.geo_predictions) == 0
Expand Down Expand Up @@ -451,7 +451,7 @@ def test_deepsolo_load_patch_predictions_centroid(runner_run_all, tmp_path):
runner = runner_run_all
_ = runner.convert_to_coords()
assert len(runner.geo_predictions)
runner.save_to_csv(tmp_path, centroid=True)
runner.to_csv(tmp_path, centroid=True)
runner.load_patch_predictions(f"{tmp_path}/patch_predictions.csv")
assert len(runner.patch_predictions)
assert len(runner.geo_predictions) == 0
Expand Down Expand Up @@ -498,12 +498,12 @@ def test_deepsolo_search_preds_errors(runner_run_all, mock_response):
runner.search_preds("maps", ignore_case=True)


def test_deepsolo_save_search_results(runner_run_all, tmp_path, mock_response):
def test_deepsolo_search_results(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
_ = runner.convert_to_parent_pixel_bounds()
out = runner.search_preds("map", ignore_case=True)
assert isinstance(out, dict)
runner.save_search_results_to_geojson(f"{tmp_path}/search_results.geojson")
runner.search_results_to_geojson(f"{tmp_path}/search_results.geojson")
assert os.path.exists(f"{tmp_path}/search_results.geojson")
gdf = gpd.read_file(f"{tmp_path}/search_results.geojson")
assert isinstance(gdf, gpd.GeoDataFrame)
Expand All @@ -513,12 +513,12 @@ def test_deepsolo_save_search_results(runner_run_all, tmp_path, mock_response):
assert "mapreader_text.png" in gdf["image_id"].values


def test_deepsolo_save_search_results_centroid(runner_run_all, tmp_path, mock_response):
def test_deepsolo_search_results_centroid(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
_ = runner.convert_to_parent_pixel_bounds()
out = runner.search_preds("map", ignore_case=True)
assert isinstance(out, dict)
runner.save_search_results_to_geojson(
runner.search_results_to_geojson(
f"{tmp_path}/search_results_centroid.geojson", centroid=True
)
assert os.path.exists(f"{tmp_path}/search_results_centroid.geojson")
Expand All @@ -539,7 +539,7 @@ def test_deepsolo_save_search_results_centroid(runner_run_all, tmp_path, mock_re
assert "mapreader_text.png" in gdf["image_id"].values


def test_deepsolo_save_search_results_errors(runner_run_all, tmp_path, mock_response):
def test_deepsolo_search_results_errors(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
with pytest.raises(ValueError, match="No results to save"):
runner.save_search_results_to_geojson(f"{tmp_path}/test.geojson")
runner.search_results_to_geojson(f"{tmp_path}/test.geojson")
Loading

0 comments on commit 5ca580a

Please sign in to comment.