Skip to content

Filtering and normalization of input data

Conal Tuohy edited this page Apr 29, 2019 · 1 revision

During the ETL process, the input XML data passes through a filtering step implemented in the stylesheet filter.xsl, which removes certain records, or parts of records, according to the rules listed below.

The filter.xsl step operates in two modes; a public mode which filters out data so that the result can be published to public API, and an internal mode which generates the data for the internal API. The filtering rules listed below all apply only to the public mode; the internal mode does not filter out anything; in fact all it does is insert a default copyright status in any object record which lacks one.

API Status

The EMu object records include an API Status field which controls which records should be published by the API.

This field may have various different values, but the EMu data export script only exports those records which have one of the following values: Public, Public Restricted, Internal, or Removed.

The filter.xsl step excludes object records from the public API unless they have a status of Public, Public Restricted, or Removed. The filter also excludes references which narrative records make to objects, unless the referenced object has the status Public or Public Restricted.

Images

The Piction image records include the URLs of various different sizes of image. The size labelled original_2 is excluded from the public API.

The "banner" images of narratives are excluded.

Rights

Images specified in the EMu objects file are ignored if their licence (AcsCCStatus) is not one of the "open" licences 'Public Domain', 'Creative Commons Commercial Use', or 'Creative Commons Non-Commercial Use'.

If the licence is not one of the "open" licences, then the licence information is also removed. This is to simplify processing in the final stages of the ETL pipeline, where an object which is lacking image rights will have all its images excluded, including images which were specified in the Piction data file.

Inward Loan

The InwardLoan flag is discarded. Later in the ETL pipeline, this InwardLoan flag would be processed, but because it is filtered out, here, the InwardLoan status currently has no effect.

Locations

The precise locations (LocCurrentLocationRef) of objects are all removed from the public dataset.

Published Narratives

Narratives are removed unless they have an "Intended AUdience" field (DesIntendedAudience_tab/DesIntendedAudience) with the value 'Collection Explorer publish'.