Skip to content
This repository has been archived by the owner on Jun 27, 2020. It is now read-only.

Batch Ingest Manifest File (2013 Redesign)

Jim Coble edited this page Apr 14, 2015 · 1 revision

Information about the contents of a manifest file.

Manifest Level

  • name: Name of this manifest. Informational only.
  • description: Description of this manifest. Informational only.
  • batch: Batch to be used for the ingest objects generated by processing this manifest. To append the objects to an existing batch, provide (only) the id of the batch. To create a new batch with a given name and description and associated with a given user, provide name, description, and user_email as appropriate. Leave the batch element out altogether to create a new batch without a name or description and not associated with a user.
    • name: Name to be used for a newly created batch. Ignored if id is provided.
    • description: Description to be used for a newly batch. Ignored if id is provided.
    • user_email: Email of user to whom newly created batch belongs. Ignored if id is provided.
    • id: Database ID of existing batch.
  • basepath: Filepath to base directory on which manifest is based; e.g., '/srv/fedora-working/ingest/COL/collection/'
  • label: Label to be used for ingest objects generated by processing this manifest.
  • model: ActiveFedora model to be used for ingest objects generated by processing this manifest; e.g., 'Collection'
  • datastreams: List of names of the metadata and/or content datastreams to be generated for ingest objects when processing this manifest. Do not include 'DC', 'RELS-EXT', or 'thumbnail' since these will be generated automatically as appropriate by the ingest process.
  • checksum: Information about externally provided checksums for the contents of the "content" datastream.
    • location: The location of the XML file containing the external checksums (path and name).
    • source: The source of the external checksums; e.g., 'dpc'.
    • type: The type (algorithm) of the external checksums if not provided in the checksum file; e.g., 'SHA-256'
    • node_xpath: The xpath to the node containing the checksum data in the XML file; e.g., '/checksums/checksum' (which is the default if this element is not provided).
    • identifier_element: The name of the element in the checksum data node (see node_xpath above) which contains the identifier of the object whose checksum is provided in that node; e.g., 'id' (which is the default if this element is not provided).
    • type_xpath: The relative xpath within the checksum data node (see node_xpath above) to the node containing the type (algorithm) of the checksum; e.g., 'type' (which is the default if this element is not provided).
    • value_xpath: The relative xpath within the checksum data node (see node_xpath above) to the node containing the value of the checksum; e.g., 'value' (which is the default if this element is not provided).
  • content, contentMetadata, contentdm, descMetadata, digitizationGuide, dpcMetadata, fmpExport, marcXML, rightsMetadata, tripodMets: Information identifying the file system location of the content to loaded into the designated datastream.
    • extension: Currently used only with content, the file extension (e.g., '.tif') to be added to the object's key (first) identifier to obtain the file name.
    • location: The filepath to the directory containing the files. If not provided, the files are assumed to be in a "canonical" location.
  • admin_policy, collection, parent: Designation of the AdminPolicy, Collection (for Targets), and/or parent objects to be associated the ingest objects generated by processing this manifest. In the simplest case, provide the pid of the AdminPolicy, Collection (for Targets), or parent object. In the next simplest case, use id to provide the identifier of the AdminPolicy, Collection (for Targets), or parent object that was used in a previously processed batch ingest (and, optionally, the batchid of the batch containing that object for disambiguation). The manifest processor will look up the PID of the AdminPolicy, Collection (for Targets), or parent object from the corresponding batch ingest object. Alternatively, provide an integer autoidlength value that can be used to extract the identifier of the AdminPolicy, Collection (for Targets), or parent object from the key (first) identifier of the object being processed. The extracted identifier (and batchid if provided) will then be used to look up the PID.
    • pid: the PID of the AdminPolicy, Collection (for Targets), or parent object.
    • id: the identifier used in a previously processed batch ingest for the AdminPolicy, Collection (for Targets), or parent object. Ignored if pid is provided.
    • autoidlength: an integer indicating the number of characters of the key (first) identifier of the object being processed that can be extracted to form the identifier of the AdminPolicy, Collection (for Targets), or parent object. This element is typically used only with the parent relationship. For example, an autoidlength of '10' and an object identifier of 'abc00100030010' would result in an extracted identifier of 'abc0010003'. Ignored if pid or id is provided.
    • batchid: the database ID of the previously processed batch in which the AdminPolicy, Collection (for Targets), or parent object was ingested. Ignored if pid is provided.
  • objects: List of objects for which ingest objects are to be generated by processing this manifest. See Object Level section below for elements that can be provided for each object in this list.

Object Level

  • identifier
  • label
  • model
  • datastreams
  • checksum
    • type
    • value
  • content, contentMetadata, contentdm, descMetadata, digitizationGuide, dpcMetadata, fmpExport, marcXML, rightsMetadata, tripodMets
  • admin_policy, collection, parent
    • pid
    • id
    • autoidlength
    • batchid
Clone this wiki locally