Skip to content

Ingestion Operator Overview

Raphael Waltenspül edited this page Sep 26, 2024 · 5 revisions

The ingestion pipeline consists of ingestion operators, which this page gives an overview of.

We provide the important information of the operators, such as factory classes and properties.

Enumerators

An ENUMERATOR typed operator is the start of the pipeline, emitting the retrievables.

Operator Properties:

Property Description
mediaTypes A list of media types to emit retrievables of. One of IMAGE, VIDEO, AUDIO, MESH

FileSystemEnumerator

Factory Class: FileSystemEnumerator

The FileSystemEnumerator emits retrievables based on the file system, specifically based on a location.

Local Ingestion Context Properties:

Property Description
path The path (relative to the working directory) to start the file tree walk from
depth The depth the tree walk should go, e.g. 1 means one level deeper than current working directory, 2 means two, etc.
skip How many items of the walk should be skipped from the start.
limit How many items the walk should have, after skipping.
regex Optionally adds a Regex pattern. The enumerator emits only files on which fullpath.matches(Regex("pattern-string")) returns true

MemoryControlledFileSystemEnumerator

Factory Class: MemoryControlledFileSystemEnumerator

The MemoryControlledFileSystemEnumerator emits retrievables based on the file system, specifically based on a location. In contrast to the FileSystemEnumerator this enumerator is memory aware and paces the emission based on available memory and a heuristic to not over-load the available memory.

Local Ingestion Context Properties:

Property Description
path The path (relative to the working directory) to start the file tree walk from
depth The depth the tree walk should go, e.g. 1 means one level deeper than current working directory, 2 means two, etc.
skip How many items of the walk should be skipped from the start.
limit How many items the walk should have, after skipping.

Decoders

An DECODER typed operator decodes the media file to Content, ready for further processing.

VideoDecoder

Factory Class: VideoDecoder

A decoder for videos, which emits video and audio.

Local Ingestion Context Properties:

Property Description
timeWindowMs The duration of the segmentation

Extractors

An EXTRACTOR typed operator extracts, analysises the content and performs the actual ingestion.

See Analyser Overview for more information of the extractors.

Exporters

An EXTRACTOR typed operator exports derivative artifacts. E.g. a thumbnail exporter produces thumbnails. These are defined on the schema, however the properties can be overridden from the ingestion context.

ThumbanilExporter

Factory Class: ThumbnailExporter (Defined in the schema and referenced by name.)

Produces thumbnails.

Local Ingestion Context Properties:

Property Description
maxSideResolution The longer side's size in pixels
mimeType The mime type to use. One of JPG, PNG

Transformers

A TRANSFORMER typed operator transforms incoming retrievables to outcoming ones, might aggregate or filter them.

TypeFilterTransformer

Factory Class: TypeFilterTransformer

Filters incoming retrievables based on their type.

Local Ingestion Context Properties:

Property Description
type The type to allow through. One of SOURCE:IMAGE, SOURCE:VIDEO, SOURCE:AUDIO, SOURCE:MESH (custom filters could be defined)

TemplateTextTransformer

Factory Class: TemplateTextTransformer

Takes a string template with defined placeholders, and fills these with content for the retrievable in the corresponding fields.

Local Ingestion Context Properties:

Property Description
template A string containing placeholders of the form $fieldName, where the placeholder must correspond to the name of the content field from which to include the string content.
defaultValue An optional parameter of text to include if a retrievable does not have content associated with a given fieldName.

LastContentAggregator

Factory Class: LastContentAggregator

Aggregates content based on the 'last' strategy.

Local Ingestion Context Properties:

none