Anaconda prompt

Open the Start menu and find the Anaconda prompt under the Anaconda3 folder

Type in cd C:\Users\Zhouy\Documents\GitHub\dcat-metadata

This command helps change from base directory to working directory where you put the dcat-metadata folder cloned from GitHub repository.
For me, it’s C:\Users\Zhouy\Documents\GitHub\dcat-metadata.

The command cd (Change Directory) enables you to navigate to another folder.

If the dcat-metadata folder is located in a different drive, for example, you can use the command E: to direct to E drive

Navigate to the dcat-metadata folder, some changes need to be made manually for the Python script. Open harvest.py, scroll down to a section commented as Manual items to change! Remember to save the script after editing.

directory - the file path of the dcat-metadata folder

Note that if you are using MAC or Linux operating system, please use the directory below.

######################################

### Manual items to change!

## names of the main directory containing folders named "jsons" and "reports"
## Windows:
directory = r'C:\Users\Zhouy\Documents\GitHub\dcat-metadata'
## MAC or Linux:
## directory = r'D:/Library RA/GitHub/dcat-metadata-master'

## csv file contaning portal list
portalFile = 'arcPortals.csv'

## list of metadata fields from the DCAT json schema for open data portals desired in the final report
fieldnames = ['Title', 'Alternative Title', 'Description', 'Language', 'Creator', 'Publisher', 'Genre',
              'Subject', 'Keyword', 'Date Issued', 'Temporal Coverage', 'Date Range', 'Solr Year', 'Spatial Coverage',
              'Bounding Box', 'Type', 'Geometry Type', 'Format', 'Information', 'Download', 'MapServer', 
              'FeatureServer', 'ImageServer', 'Identifier', 'Provenance', 'Code', 'Is Part Of', 'Status',
              'Accrual Method', 'Date Accessioned', 'Rights', 'Access Rights', 'Suppressed', 'Child']

## list of fields to use for the deletedItems report
delFieldsReport = ['identifier', 'landingPage', 'portalName']

## list of fields to use for the portal status report
statusFieldsReport = ['portalName', 'total', 'new_items', 'deleted_items']
#######################################

In the Anaconda prompt, type in python harvest.py to execute the script

The script will print portalName and URL after it loops through the older JSON files.
If you see the notification There is no comparison json for ***, meaning that it is a new data portal added this month.
When you see the file path again, the script has been successfully run!

Go to the reports folder, use Date modified column to sort files in descending order. Three csv reports generated just now would go to the top.

Tip: Make sure the file size isn’t 0 KB, otherwise you may get into trouble.

Same as before, go to jsons folder, check if JSON files are downloaded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anaconda prompt

Processing the DCAT metadata

API Harvesting

What is DCAT

Variations on the DCAT schema

How the DCAT scripts work

Workflow

Step 1: Obtain basic metadata and turn it into a spreadsheet

Step 2: Normalize and augment records in the spreadsheet

Clone this wiki locally