Skip to content
This repository has been archived by the owner on Jun 25, 2022. It is now read-only.

Anaconda prompt

Gene Cheng edited this page Oct 6, 2020 · 4 revisions
  1. Open the Start menu and find the Anaconda prompt under the Anaconda3 folder

  1. Type in cd C:\Users\Zhouy\Documents\GitHub\dcat-metadata

    This command helps change from base directory to working directory where you put the dcat-metadata folder cloned from GitHub repository.
    For me, it’s C:\Users\Zhouy\Documents\GitHub\dcat-metadata.

    The command cd (Change Directory) enables you to navigate to another folder.

    If the dcat-metadata folder is located in a different drive, for example, you can use the command E: to direct to E drive

  1. Navigate to the dcat-metadata folder, some changes need to be made manually for the Python script. Open harvest.py, scroll down to a section commented as Manual items to change! Remember to save the script after editing.

    • directory - the file path of the dcat-metadata folder

    Note that if you are using MAC or Linux operating system, please use the directory below.

    ######################################
    
    ### Manual items to change!
    
    ## names of the main directory containing folders named "jsons" and "reports"
    ## Windows:
    directory = r'C:\Users\Zhouy\Documents\GitHub\dcat-metadata'
    ## MAC or Linux:
    ## directory = r'D:/Library RA/GitHub/dcat-metadata-master'
    
    ## csv file contaning portal list
    portalFile = 'arcPortals.csv'
    
    ## list of metadata fields from the DCAT json schema for open data portals desired in the final report
    fieldnames = ['Title', 'Alternative Title', 'Description', 'Language', 'Creator', 'Publisher', 'Genre',
                  'Subject', 'Keyword', 'Date Issued', 'Temporal Coverage', 'Date Range', 'Solr Year', 'Spatial Coverage',
                  'Bounding Box', 'Type', 'Geometry Type', 'Format', 'Information', 'Download', 'MapServer', 
                  'FeatureServer', 'ImageServer', 'Identifier', 'Provenance', 'Code', 'Is Part Of', 'Status',
                  'Accrual Method', 'Date Accessioned', 'Rights', 'Access Rights', 'Suppressed', 'Child']
    
    ## list of fields to use for the deletedItems report
    delFieldsReport = ['identifier', 'landingPage', 'portalName']
    
    ## list of fields to use for the portal status report
    statusFieldsReport = ['portalName', 'total', 'new_items', 'deleted_items']
    #######################################
  2. In the Anaconda prompt, type in python harvest.py to execute the script

    The script will print portalName and URL after it loops through the older JSON files.
    If you see the notification There is no comparison json for ***, meaning that it is a new data portal added this month.
    When you see the file path again, the script has been successfully run!

  1. Go to the reports folder, use Date modified column to sort files in descending order. Three csv reports generated just now would go to the top.

    Tip: Make sure the file size isn’t 0 KB, otherwise you may get into trouble.

  1. Same as before, go to jsons folder, check if JSON files are downloaded.