-
Notifications
You must be signed in to change notification settings - Fork 0
Filtering and normalization of input data
During the ETL process, the input XML data passes through a filtering step implemented in the stylesheet filter.xsl
, which removes certain records, or parts of records, according to the rules listed below.
The filter.xsl
step operates in two modes; a public
mode which filters out data so that the result can be published to public
API, and an internal
mode which generates the data for the internal
API. The filtering rules listed below all apply only to the public
mode; the internal
mode does not filter out anything; in fact all it does is insert a default copyright status in any object record which lacks one.
The EMu object records include an API Status field which controls which records should be published by the API.
This field may have various different values, but the EMu data export script only exports those records which have one of the following values: Public
, Public Restricted
, Internal
, or Removed
.
The filter.xsl
step excludes object records from the public API unless they have a status of Public
, Public Restricted
, or Removed
. The filter also excludes references which narrative records make to objects, unless the referenced object has the status Public
or Public Restricted
.
The Piction image records include the URLs of various different sizes of image. The size labelled original_2
is excluded from the public API.
The "banner" images of narratives are excluded.
Images specified in the EMu objects file are ignored if their licence (AcsCCStatus
) is not one of the "open" licences 'Public Domain', 'Creative Commons Commercial Use', or 'Creative Commons Non-Commercial Use'.
If the licence is not one of the "open" licences, then the licence information is also removed. This is to simplify processing in the final stages of the ETL pipeline, where an object which is lacking image rights will have all its images excluded, including images which were specified in the Piction data file.
The InwardLoan
flag is discarded. Later in the ETL pipeline, this InwardLoan
flag would be processed, but because it is filtered out, here, the InwardLoan
status currently has no effect.
The precise locations (LocCurrentLocationRef
) of objects are all removed from the public
dataset.
Narratives are removed unless they have an "Intended AUdience" field (DesIntendedAudience_tab/DesIntendedAudience
) with the value 'Collection Explorer publish'.