Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving metadata creation for datasets and inPort #16

Open
eeholmes opened this issue Jan 29, 2025 · 11 comments
Open

Improving metadata creation for datasets and inPort #16

eeholmes opened this issue Jan 29, 2025 · 11 comments
Assignees

Comments

@eeholmes
Copy link

eeholmes commented Jan 29, 2025

Google Doc for Notes. Add info and what you know here.

SMART GOAL: Come up with some specific, doable (i.e. small), things we can do to improve metadata creation, esp in regards to inPort. Could be documentation, write-up of how different groups are approaching metadata creation, scripts, packages, interviews with people who might help us (NCEI, inPort).

Eli, Craig, Alice, Molly, Kourtney, Marylou, Dawn, Lynn, Carissa, Ana

Craig

  • cataloging data sets
  • importing schemas into GitHub
  • table definitions
  • inPort out of date
  • tech solutions assign owners to every data
  • create GitHub Action to generate xml

Eli

  • yaml for

Alice

  • archiving of water column data NCEI
  • not going into inPort
  • Tbs to NCEI on harddrive. Cruise pack
  • metadata is a copy of what is created in R

Molly
*

@eeholmes eeholmes converted this from a draft issue Jan 29, 2025
@eeholmes eeholmes self-assigned this Jan 29, 2025
@Kourtney-Burger
Copy link

Kourtney-Burger commented Jan 30, 2025

Sharing my inPort experience and data archive goals

  • currently archiving passive acoustic monitoring (PAM) metadata to multiple sources and would like to combine efforts where applicable (archive to NCEI, inPort, and 2 PAM specific archives PACM and Tethys)
  • recently learned about inPort and met with SWFSC inPort librarian who shared some training videos with me
  • I'm currently automating things where I can but I still have a data archive repo for every source (NCEI has great tools for PAM data like PACE and Passive Packer)

I've spoke with some other PAM teams from other centers and I think we are all in the same place with inPort so it would be great to develop something with this group that I can share with my larger PAM team (and any other data type)!

@eeholmes
Copy link
Author

eeholmes commented Feb 4, 2025

On Feb 4, Kourtney, Eli, Alice and Matt met.

  • Kourtney demo-ed passivepacker and PACE which are tools to create the package (data + metadata files) that NCEI uses to add to PAM archive
  • Alice demo-ed cruisepacker which is similar
  • Kourtney and Alice will meet so Kourtney can show Alice how to automate batching data entries w cruisepacker

We outlined some questions to get more info on

  • How to other types of NMFS data get into NCEI? Are there similar 'packers' and tools to help create the needed metadata files?
  • Does data archive in NCEI automatically get added to inPort? Does data that is part of an on-going survey (so not 'new') get a new inPort entry?

More notes in our Google Doc: https://docs.google.com/document/d/1mZ42TqSfOfvpYFctABao749wzSX6-JcZHRdslf0hTxs/edit?usp=sharing

@alicebeittel
Copy link

alicebeittel commented Feb 11, 2025

Hi All,

I won't be able to make co-working today but, I did chat with our Life History Team on their use of InPort and ERDDAP. ERDDAP is what they use as the primary backup to publish trawl biological data. InPort seems to hold the metadata and a complete copy of historical biological data. ERDDAP will have the biological data published by survey each year but InPort will have one csv file that has all of the biological data year after year. So there is redundancy right now with what is on InPort and what is on ERDDAP. Both are updated separately by emailing two different individuals the data who then publish it to the websites (someone from SWFSC IT and someone from ERDDAP).

As far as we know they are not linked to NCEI and are submitted via emailing csv files.

On InPort the csv files present seem to be linked to some type of google storage (storage.googleapis.com) but only IT people have editing access to the google folder to upload/change data sets.

@eeholmes @Kourtney-Burger

@Kourtney-Burger
Copy link

Hi Alice, we are keeping notes on that shared google doc for todays meeting. Do you know who are SWFSC was helping with the InPort upload?

@alicebeittel
Copy link

alicebeittel commented Feb 11, 2025

Awesome, I'll be able to take a look at those later today. Thanh Vu from IT is the SWFSC POC for the life history group for InPort upload.

@alicebeittel
Copy link

On Feb 13, @Kourtney-Burger and @alicebeittel met and Kourtney shared her process for creating package profiles in batch for passive acoustic data using Passive Packer to send to NCEI. Her code and process is listed here on GitHub, she has an awesome Quarto Page! There are many similarities to CruisePack (NCEI packager for water column acoustic data) and I'll be looking at how to adapt the code for CruisePack and our fisheries surveys. The first steps will be 1) writing some code to compile our survey metadata from various R scripts used to make our survey report and from folder names on our server 2) Downloading SQLiteStudio to open up the backend of the CruisePack executable and see how CruisePack organizes the metadata.

@Kourtney-Burger
Copy link

Excited to see how you can adopt this method! Hopefully it saves some time for future you!

@eeholmes
Copy link
Author

Hi All,

I won't be able to make co-working today but, I did chat with our Life History Team on their use of InPort and ERDDAP. ERDDAP is what they use as the primary backup to publish trawl biological data. InPort seems to hold the metadata and a complete copy of historical biological data. ERDDAP will have the biological data published by survey each year but InPort will have one csv file that has all of the biological data year after year. So there is redundancy right now with what is on InPort and what is on ERDDAP. Both are updated separately by emailing two different individuals the data who then publish it to the websites (someone from SWFSC IT and someone from ERDDAP).

As far as we know they are not linked to NCEI and are submitted via emailing csv files.

On InPort the csv files present seem to be linked to some type of google storage (storage.googleapis.com) but only IT people have editing access to the google folder to upload/change data sets.

@eeholmes @Kourtney-Burger

Did you get any info on how they create the metadata? Do they create separate metadata for ERDDAP versus inPort or are they able to use the same XML (or similar) file?

@eeholmes
Copy link
Author

@eeholmes test how long to get notification

@alicebeittel
Copy link

@eeholmes I learned that the metadata doesn't really change year to year for the InPort FRD trawl database. It is the same general cruise description. Some updates were made recently to the descriptions to update it (sounds like the metadata wasn't updated for some time and needed some revisions). What IS updated each year is the csv files listed with the actual trawl biological data. The new data each year is appended to the csv and a new csv is submitted to InPort. The Life History group also sends the same trawl biological data to ERDDAP. The don't submit metadata with the ERDDAP submission (it sounds like the metadata is already stored on the site and doesn't change year to year). The same biological data is stored in two places. So yes, the metadata format is different for ERDDAP vs InPort but since it doesn't change year to year they are not creating separate metadata each year for submission.

Trawl Specimen Data ERDDAP: https://coastwatch.pfeg.noaa.gov/erddap/tabledap/FRDCPSTrawlLHSpecimen.html
Haul Catch Data ERDDAP: https://coastwatch.pfeg.noaa.gov/erddap/tabledap/FRDCPSTrawlLHHaulCatch.html

@eeholmes
Copy link
Author

Feb 18
Kourtney met with SWFSC Data Governance Meeting notes

Matt, Alice, Eli and Kourtney met with Michael Liddel and Scott Sauri (inPort owner) to discuss improving the NCEI to inPort metadata creation. notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress Goals
Development

No branches or pull requests

3 participants