The iSamples flat file data format is backed by a frictionlessdata.io schema. The schema is human-readable and is located in the iSamples GitHub. From an end-user perspective, the workflow for adding records to iSamples is quite simple:
- Clone the template repository from the iSamples GitHub.
- Add rows to the
simple_isamples.csv
file. (Note that if you are using one of the other file formats, you'll commit that file instead). - Push the changes to GitHub. As part of this push, a workflow is triggered that will generate a harvestable sitemap from the repository.
- Once the github workflow is complete, you can look at the gh-pages branch of the repository to see the sitemap.
Since we are built on top of frictionlessdata, technically we support any file format that they support. However, the formats that we have tested in the iSamples repository are:
- csv
- tsv
- xlsx
- xls
It ought to be a relatively trivial process to add new formats, so just ask if the one you want isn't already done.
The template repository has a GitHub workflow that:
- Checks out the iSamples in a Box repository
- Installs the python environment
- Runs the validate command on
isb_things.py
- If the file format passes validation, runs the sitemap command on
isb_things.py
, and publishes output to a newsitemaps
directory. - Takes that
sitemaps
directory and publishes that to thegh-pages
branch on the target repository.
If a file is pushed that doesn't pass validation, the errors will be available on github for inspection. The workflow is aborted and the sitemap isn't generated. The errors will have a code and specify a line number, for example:
code message
16
--------------- ---------------------------------------------------------------------------------------------------------------------
17
missing-label There is a missing label in the header's field "label" at position "2"