-
Notifications
You must be signed in to change notification settings - Fork 1
Descriptive Metadata (Old)
During the pilot project, we employed the practices noted below relating to descriptive metadata. As the pilot comes to an end, it it seems a good time to review these practices and decide what changes, if any, should be made for the initial production implementation.
One of the goals for the repository project set forth in the pilot was to bring together in one place the digital content objects along with any pertinent descriptive and technical metadata relating to them. For the content objects used in the pilot (the Vica and Kwilecki collections), three sources of existing descriptive metadata were identified:
- Aleph (Vica collection and item records, Kwilecki collection record)
- CONTENTdm (Kwilecki collection and item records)
- Tripod2 (Vica collection and item records, Kwilecki collection and item records)
For the pilot, we downloaded the metadata from each source and stored it "as is" in a separate datastream in the appropriate collection or item object:
- Aleph data was extracted as MarcXML and stored in a "marcXML" datastream
- CONTENTdm data was extracted using custom XML export (basically the custom export format used for Tripod2) and stored in a "contentdm" datastream
- Tripod2 METS data was copied from .../static/xml/mets/... on TUCASI_CIFS5 and stored in a "tripodMets" datastream
(Note: For the pilot, we explicitly decided not to deal with any descriptive metadata that might be found in finding aids. We recognized, however, that finding aids might be a source of descriptive metadata for certain objects.)
In addition to the simple storage of descriptive metadata described above, for the pilot, we also normalized the descriptive metadata into a common schema with which we could leverage the opinionated metadata functionality provided as part of the Hydra Project. ("Opinionated metadata" is a mechanism for defining a terminology that provides a translation between Ruby objects and XML data.) We stored the normalized metadata in a "descMetadata" datastream in each object.
For the pilot, we used Qualified Dublin Core (QDC) as the common schema for the "normalized" descriptive metadata. (Many Hydra sites use MODS as the common schema for descriptive metadata and, in the pilot, we initially experimented with MODS, but ultimately opted for QDC as a closer fit to current metadata practices, at least for the digital collections involved in the pilot.)
To produce the QDC metadata for each object, we performed a XSL transformation on the data from one of the existing descriptive metadata sources described in the first section above:
- If we had CONTENTdm data for the object (i.e., the Kwilecki collection and item objects), we used a custom XSL stylesheet that we wrote to transform the CONTENTdm export data into QDC. We consider the stylesheet that we wrote to be only a preliminary draft and, if we continue the practice of transforming CONTENTdm export data into QDC, the stylesheet will need to be reviewed and almost certainly altered.
- If we did not have CONTENTdm data for the object but did have Aleph data (i.e, the Vica collection and item objects), we used a slightly altered version of the MARC21slim2OAIDC stylesheet provided by the Library of Congress. If we continue the practice of transforming Aleph MarcXML data into QDC, this stylesheet should be reviewed and possibly altered.