maps-as-data · rwood-97 · Dec 19, 2024 · Nov 21, 2024 · Nov 21, 2024 · Nov 22, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -17,6 +17,11 @@ The following table shows which versions of MapReader are compatible with which
 
 _Add new changes here_
 
+## Added
+
+- Added ability to save and reload text predictions ([#536](https://github.com/maps-as-data/MapReader/pull/536)
+- Added minimal dataclasses for text predictions ([#536](https://github.com/maps-as-data/MapReader/pull/536)
+
 ## [v1.6.1](https://github.com/Living-with-machines/MapReader/releases/tag/v1.6.1) (2024-11-18)
 
 ### Added

diff --git a/docs/source/using-mapreader/step-by-step-guide/6-spot-text.rst b/docs/source/using-mapreader/step-by-step-guide/6-spot-text.rst
@@ -223,7 +223,7 @@ You can do this by setting the ``deduplicate`` argument and passing a ``min_ioa`
 
 This will help resolve any issues with predictions being cut-off at the edges of patches since the overlap should help find the full piece of text.
 
-Again, to view the predictions, you can use the ``show`` method.
+Again, to view the predictions, you can use the ``show_predictions`` method.
 You should pass a parent image ID as the ``image_id`` argument:
 
 .. code-block:: python
@@ -244,16 +244,11 @@ As above, use the ``border_color``, ``text_color`` and ``figsize`` arguments to
         figsize = (20, 20),
     )
 
-You can save your predictions to a csv file using the pandas ``to_csv`` method:
-
-.. code-block:: python
-
-    parent_preds_df.to_csv("text_preds.csv")
 
 Geo-reference
 -------------
 
-If you maps are georeferenced in your ``parent_df``, you can also convert the pixel bounds to georeferenced coordinates using the ``convert_to_coords`` method:
+If you maps are georeferenced in your ``parent_df``, you can also convert the pixel coordinates to georeferenced coordinates using the ``convert_to_coords`` method:
 
 .. code-block:: python
 
@@ -282,14 +277,70 @@ Or, if your maps are taken from a tilelayer, you can specify the URL of the tile
 You can also pass in a dictionary of ``style_kwargs`` to customize the appearance of the map.
 Refer to the `geopandas explore documentation <https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html>`__ for more information on the available options.
 
-Again, you can save your georeferenced predictions to a csv file (as shown above), or, you can save them to a geojson file for loading into GIS software:
+
+Saving
+------
+
+You can save your georeferenced predictions to a geojson file for loading into GIS software using the ``to_geojson`` method:
 
 .. code-block:: python
 
-    my_runner.save_to_geojson("text_preds.geojson")
+    my_runner.to_geojson("text_preds.geojson")
 
 This will save the predictions to a geojson file, with each text prediction as a separate feature.
 
+By default, the geometry column will contain the polygon representing the bounding box of your text.
+If instead you would like to save just the centroid of this polygon, you can set the ``centroid`` argument:
+
+.. code-block:: python
+
+    my_runner.to_geojson("text_preds.geojson", centroid=True)
+
+This will save the centroid of the bounding box as the geometry column and create a "polygon" column containing the original polygon.
+
+At any point, you can also save your patch, parent and georeferenced predictions to CSV files using the ``to_csv`` method:
+
+.. code-block:: python
+
+    my_runner.to_csv("my_preds/")
+
+This will create a folder called "my_preds" and save the patch, parent and georeferenced predictions to CSV files within it.
+
+As above, you can use the ``centroid=True`` argument to save the centroid of the bounding box instead of the full polygon.
+
+
+Loading
+-------
+
+If you have saved your predictions and want to reload them into a runner, you use either of the ``load_geo_predictions`` or ``load_patch_predictions`` methods.
+
+.. note:: These methods will overwrite any existing predictions in the runner. So if you want to keep your existing predictions, you should save them to a file first!
+
+The ``load_geo_predictions`` method is used to load georeferenced predictions from a geojson file:
+
+.. code-block:: python
+
+    my_runner.load_geo_predictions("text_preds.geojson")
+
+Loading this will populate the patch, parent and georeferenced predictions in the runner.
+
+The ``load_patch_predictions`` method is used to load patch predictions from a CSV file or pandas DataFrame.
+To load a CSV file, you can use:
+
+.. code-block:: python
+
+    my_runner.load_patch_predictions("my_preds/patch_preds.csv")
+
+Or, to load a pandas DataFrame, you can use:
+
+.. code-block:: python
+
+    my_runner.load_patch_predictions(patch_preds_df)
+
+This will populate the patch and parent predictions in the runner but not the georeferenced predictions (in case you do not have georefencing information).
+If you do want to convert your text predictions from pixel coordinates to geospatial coordinates, you can use the ``convert_to_coords`` method as shown above.
+
+
 Search predictions
 ------------------
 
@@ -358,14 +409,16 @@ You can also pass in a dictionary of ``style_kwargs`` to customize the appearanc
 Save search results
 ~~~~~~~~~~~~~~~~~~~
 
-If your maps are georeferenced, you can also save your search results using the ``save_search_results_to_geojson`` method:
+If your maps are georeferenced, you can also save your search results using the ``search_results_to_geojson`` method:
 
 .. code-block:: python
 
-    my_runner.save_search_results_to_geojson("search_results.geojson")
+    my_runner.search_results_to_geojson("search_results.geojson")
 
-This will save the search results to a geojson file, with each search result as a separate feature.
+This will save the search results to a geojson file, with each search result as a separate feature which can be loaded into GIS software for further analysis/exploration.
 
-These can then be loaded into GIS software for further analysis/exploration.
+If, however, your maps are not georeferenced, you will need to save the search results to a csv file using the pandas ``to_csv`` method:
+
+.. code-block:: python
 
-If your maps are not georeferenced, you can save the search results to a csv file using the pandas ``to_csv`` method (as shown above).
+    search_results_df.to_csv("search_results.csv")
diff --git a/mapreader/spot_text/dataclasses.py b/mapreader/spot_text/dataclasses.py
@@ -0,0 +1,30 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+from shapely.geometry import Polygon
+
+
+@dataclass(frozen=True)
+class PatchPrediction:
+    pixel_geometry: Polygon
+    score: float
+    text: str = None
+
+
+@dataclass(frozen=True)
+class ParentPrediction:
+    pixel_geometry: Polygon
+    score: float
+    patch_id: str
+    text: str = None
+
+
+@dataclass(frozen=True)
+class GeoPrediction:
+    pixel_geometry: Polygon
+    score: float
+    patch_id: str
+    geometry: Polygon
+    crs: str
+    text: str = None
diff --git a/mapreader/spot_text/deepsolo_runner.py b/mapreader/spot_text/deepsolo_runner.py
@@ -20,10 +20,10 @@
 import torch
 from deepsolo.config import get_cfg
 
-from .rec_runner_base import RecRunner
+from .runner_base import DetRecRunner
 
 
-class DeepSoloRunner(RecRunner):
+class DeepSoloRunner(DetRecRunner):
     def __init__(
         self,
         patch_df: pd.DataFrame | gpd.GeoDataFrame | str | pathlib.Path,

diff --git a/mapreader/spot_text/dptext_detr_runner.py b/mapreader/spot_text/dptext_detr_runner.py
@@ -21,10 +21,11 @@
 from dptext_detr.config import get_cfg
 from shapely import MultiPolygon, Polygon
 
-from .runner_base import Runner
+from .dataclasses import PatchPrediction
+from .runner_base import DetRunner
 
 
-class DPTextDETRRunner(Runner):
+class DPTextDETRRunner(DetRunner):
     def __init__(
         self,
         patch_df: pd.DataFrame | gpd.GeoDataFrame | str | pathlib.Path,
@@ -71,7 +72,7 @@ def __init__(
         # setup the predictor
         self.predictor = DefaultPredictor(cfg)
 
-    def get_patch_predictions(
+    def _get_patch_predictions(
         self,
         outputs: dict,
         return_dataframe: bool = False,
@@ -107,7 +108,7 @@ def get_patch_predictions(
         self._deduplicate(image_id, min_ioa=min_ioa)
 
         if return_dataframe:
-            return self._dict_to_dataframe(self.patch_predictions, geo=False)
+            return self._dict_to_dataframe(self.patch_predictions)
         return self.patch_predictions
 
     def _post_process(self, image_id, scores, pred_classes, bd_pnts):
@@ -122,59 +123,6 @@ def _post_process(self, image_id, scores, pred_classes, bd_pnts):
 
             score = f"{score:.2f}"
 
-            self.patch_predictions[image_id].append([polygon, score])
-
-    @staticmethod
-    def _dict_to_dataframe(
-        preds: dict,
-        geo: bool = False,
-        parent: bool = False,
-    ) -> pd.DataFrame:
-        """Convert the predictions dictionary to a pandas DataFrame.
-
-        Parameters
-        ----------
-        preds : dict
-            A dictionary of predictions.
-        geo : bool, optional
-            Whether the dictionary is georeferenced coords (or pixel bounds), by default True
-        parent : bool, optional
-            Whether the dictionary is at the parent level, by default False
-
-        Returns
-        -------
-        pd.DataFrame
-            A pandas DataFrame containing the predictions.
-        """
-        if geo:
-            columns = ["geometry", "crs", "score"]
-        else:
-            columns = ["geometry", "score"]
-
-        if parent:
-            columns.append("patch_id")
-
-        preds_df = pd.concat(
-            pd.DataFrame(
-                preds[k],
-                index=np.full(len(preds[k]), k),
-                columns=columns,
+            self.patch_predictions[image_id].append(
+                PatchPrediction(pixel_geometry=polygon, score=score)
             )
-            for k in preds.keys()
-        )
-
-        if geo:
-            # get the crs (should be the same for all)
-            if not preds_df["crs"].nunique() == 1:
-                raise ValueError("[ERROR] Multiple crs found in the predictions.")
-            crs = preds_df["crs"].unique()[0]
-
-            preds_df = gpd.GeoDataFrame(
-                preds_df,
-                geometry="geometry",
-                crs=crs,
-            )
-
-        preds_df.index.name = "image_id"
-        preds_df.reset_index(inplace=True)
-        return preds_df
diff --git a/mapreader/spot_text/maptext_runner.py b/mapreader/spot_text/maptext_runner.py
@@ -20,10 +20,10 @@
 import torch
 from maptextpipeline.config import get_cfg
 
-from .rec_runner_base import RecRunner
+from .runner_base import DetRecRunner
 
 
-class MapTextRunner(RecRunner):
+class MapTextRunner(DetRecRunner):
     def __init__(
         self,
         patch_df: pd.DataFrame | gpd.GeoDataFrame | str | pathlib.Path,