Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional initializer #23

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Additional initializer #23

wants to merge 14 commits into from

Conversation

Andrey170170
Copy link
Collaborator

@Andrey170170 Andrey170170 commented Feb 4, 2025

Added new initializers - fathom_net, EoL and Lila

Small fixes:

  • added MATERIAL_CITATION filtering for gbif initializer, issue described here
  • formatting updates
  • consolidate dependencies in pyproject
  • adds descriptions of downloaded data format

Andrey170170 and others added 10 commits August 7, 2024 21:04
Small fixes
Also adjusted tools to use `source_id` instead of `gbif_id`
Added tool_name_override option for Tools, to be able to use custom tools
Added the use of `verification_scheme` instead of hard coded column names for some of the parts of the runner
plus some minor fixes
Updated readme - added `how to access data` section.

Updated pyproject.toml - added dependency libraries directly into this file, instead of link to `requirements.txt`.
Now there is a distinction between scheduled filtering or scheduling jobs and completed ones.

Adjusted logic of scripts according to this change.
@egrace479 egrace479 added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 4, 2025
Extracted initializers into a class structure
Rewrote initialization calling file to have a dict with initialization types.
Added `initializer_type` in mandatory config fields
@egrace479
Copy link
Member

Please add description of base initializer, inheritance to child initializers, and considerations for making a custom child initializer. Use existing as examples, e.g., filters that could be applied (GBIF excluding MATERIAL_CITATION). This would be a good place to note the importance of understanding the metadata coming from the source before creating a child initializer and to not rely on source IDs to be persistent (also check uniqueness if relying on them to map to additional metadata)---considering EOL content IDs, which may be unique but it's the page ID that maps to the taxa information.

As discussed, put this description into a README in the initializer/ folder and link to it from the root repo README.

Andrey170170 added 2 commits February 9, 2025 23:47
Added README.md to initializer.
Made small code quality adjustments to initializers
Added doc strings to `base_initializer.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants