Skip to content

Commit

Permalink
stats: document which stats to index
Browse files Browse the repository at this point in the history
  • Loading branch information
ntarocco committed Feb 24, 2025
1 parent 7e44d96 commit e2db046
Showing 1 changed file with 30 additions and 10 deletions.
40 changes: 30 additions & 10 deletions docs/develop/howtos/backup_search_indices.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Back up search indices

Starting with InvenioRDM v12, not all search indices can be recreated from the database
anymore.
In order to not lose any data when something goes wrong with the search indices, they
Expand All @@ -12,9 +11,38 @@ should be backed up regularly.
(total, from harvesters and users), which produced approximately **10Gb** of
usage statistics data.

## Indices to back up
All search indices can be re-created from the database except for the `statistics` indices.
These indices are not stored in the database, and if lost, there is potentially no way to
recover this data. This means that view and download statistics may be permanently lost.

**Types of statistics indices**

Unless configured otherwise, there are two main classes of statistics indices:

- **Raw events**: Each individual view or download hit.
- **Aggregated events**: Aggregated statistics per record or file.

Each of these indices is created on a **monthly basis** for both views and downloads.
For more details, see the [related documentation](../../reference/statistics.md).

**Examples of index naming**

- `<prefix>-events-stats-file-download-2023-08`: Stores raw file download events for August 2023.
- `<prefix>-stats-record-view-2023-08`: Stores aggregated record views for August 2023.

Additionally, there is a `<prefix>-stats-bookmarks` index that tracks the latest aggregation created.

**Backup recommendations**

The most critical indices to back up are those matching the pattern:

## Non-exhaustive list of methods
`<prefix>-events-stats*`

If lost, **aggregated indices** and **bookmarks** can be re-created from the raw event data.
However, if you have enough backup space, it is recommended to back up all statistics indices.

## Non-exhaustive list of backup methods
There are several tools and approaches to pick from when it comes to preserving your
indexed data in Elasticsearch or OpenSearch.
Here is a small selection:
Expand All @@ -32,19 +60,15 @@ Here is a small selection:
Of course there's more possibilities out there (e.g. [Curator](https://github.com/elastic/curator)
for Elasticsearch), but this should give you an idea where to start.


## Elasticdump

For the sake of brevity, this guide only deals with `elasticdump` as it is a very simple
tool to use and works with both Elasticsearch and OpenSearch.

!!! info "Make sure to back up all relevant indices"
Please note that the given example only deals with backing up and restoring one single index.
In your instance, you should make sure to back up *all* relevant indices regularly!


### Backup

All it takes to back up a search index are two commands, one for saving the mappings
and one for saving the data.
With the following commands, you'll create the files `stats-record-view-2023-04.mappings.json`
Expand Down Expand Up @@ -91,9 +115,7 @@ $ curl -X DELETE http://localhost:9200/my-site-stats-record-view-2023-04

And just like that, the record view statistics for all of April 2023 are gone.


### Restore

Luckily, the restore is just as simple as the backup was.
All you need to do is to restore the mappings first, and the data afterwards.
The commands are nearly identical to the backup, just with the values for `--input` and
Expand Down Expand Up @@ -125,9 +147,7 @@ Tue, 04 Apr 2023 17:01:40 GMT | dump complete

Now, the index is fully restored including the mappings as well as the data!


### Example scripts

Backing up each index manually can quickly become tedious.
Luckily there is an API endpoint for listing the available indices in both [ES](https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-indices.html) and [OS](https://opensearch.org/docs/latest/api-reference/cat/cat-indices/).
The following basic `bash` script uses this endpoint to find all `stats` indices and backs them up:
Expand Down

0 comments on commit e2db046

Please sign in to comment.