-
Notifications
You must be signed in to change notification settings - Fork 1
Anaconda prompt
- Open the Start menu and find the Anaconda prompt under the Anaconda3 folder
-
Type in
cd C:\Users\Zhouy\Documents\GitHub\dcat-metadata
This command helps change from base directory to working directory where you put the dcat-metadata folder cloned from GitHub repository.
For me, it’sC:\Users\Zhouy\Documents\GitHub\dcat-metadata
.The command cd (Change Directory) enables you to navigate to another folder.
If the dcat-metadata folder is located in a different drive, for example, you can use the command
E:
to direct to E drive
-
Navigate to the dcat-metadata folder, some changes need to be made manually for the Python script. Open harvest.py, scroll down to a section commented as Manual items to change! Remember to save the script after editing.
- directory - the file path of the dcat-metadata folder
Note that if you are using MAC or Linux operating system, please use the directory below.
###################################### ### Manual items to change! ## names of the main directory containing folders named "jsons" and "reports" ## Windows: directory = r'C:\Users\Zhouy\Documents\GitHub\dcat-metadata' ## MAC or Linux: ## directory = r'D:/Library RA/GitHub/dcat-metadata-master' ## csv file contaning portal list portalFile = 'arcPortals.csv' ## list of metadata fields from the DCAT json schema for open data portals desired in the final report fieldnames = ['Title', 'Alternative Title', 'Description', 'Language', 'Creator', 'Publisher', 'Genre', 'Subject', 'Keyword', 'Date Issued', 'Temporal Coverage', 'Date Range', 'Solr Year', 'Spatial Coverage', 'Bounding Box', 'Type', 'Geometry Type', 'Format', 'Information', 'Download', 'MapServer', 'FeatureServer', 'ImageServer', 'Identifier', 'Provenance', 'Code', 'Is Part Of', 'Status', 'Accrual Method', 'Date Accessioned', 'Rights', 'Access Rights', 'Suppressed', 'Child'] ## list of fields to use for the deletedItems report delFieldsReport = ['identifier', 'landingPage', 'portalName'] ## list of fields to use for the portal status report statusFieldsReport = ['portalName', 'total', 'new_items', 'deleted_items'] #######################################
-
In the Anaconda prompt, type in
python harvest.py
to execute the scriptThe script will print portalName and URL after it loops through the older JSON files.
If you see the notificationThere is no comparison json for ***
, meaning that it is a new data portal added this month.
When you see the file path again, the script has been successfully run!
-
Go to the reports folder, use Date modified column to sort files in descending order. Three csv reports generated just now would go to the top.
Tip: Make sure the file size isn’t 0 KB, otherwise you may get into trouble.
- Same as before, go to jsons folder, check if JSON files are downloaded.