-
Notifications
You must be signed in to change notification settings - Fork 3
Getting Started
This is a quick guide on how to setup vitrivr-engine for ingestion and retrieval.
- ingestion: The process of adding multimedia content to the system, such that it can be retrieved later.
- retrieval: The process of querying the system to find multimedia content.
Apart from these two terms, we do not delve into any terminology. For more in-depth information, please find more information in other parts of the wiki.
These are the prerequisites of this guide and are essential:
- JDK 21 or higher, e.g. OpenJDK
- CottontailDB at least v0.16.5 OR PostgreSQL with pgVector
- Multimedia content, such as videos and images.
- Start CottontailDB on the default port
1865
OR PostgreSQL with pgVector on the default port5432
. - Build vitrivr-engine (from the root of the repository): Unix:
./gradlew distZip
Windows:
.\gradlew.bat distZip
- Unzip the distribution, e.g.
unzip -d ../instance/ vitrivr-engine-server/build/distributions/vitrivr-engine-server-0.0.1-SNAPSHOT.zip
- Prepare the media data into a folder called
sandbox/media
By now, you should have the following folder structure:
+ vitrivr-engine/
|
+ instance/
|
+ vitrivr-engine-server-0.0.1-SNAPSHOT/
|
+ bin/
|
+ lib/
+ sandbox/
|
+ media/
|
- my-img-1.png
|
- my-img-2.jpg
|
- video.mp4
From now on, we navigate to the instance
folder.
The goal of this guide is to give a headstart to use vitrivr-engine for ingestion and retrieval. Ultimately, our goal is to search a multimedia collection using content-based methods, such as querying our system with a colour. In order to do so, the system has to have a representation of the files in the collection and be aware of their colour. For the sake of this guide, we configure vitrivr-engine in a way that the average colour can be searched for, hence this is our feature of choice. Furthermore, we want vitrivr-engine know the original file-names.
Disclaimer: In a more practical setup, other features are desirable, which we take into consideration in the example
Similar to (relational) databases, vitrivr-engine works on the notion of a schema, which defines the representation of the multimedia content:
Create a file schema.json
:
{
"schemas": {
"sandbox": {}
}
}
For the sake of this guide we simply limit ourselves to a single schema.
vitrivr-engine requries a running database. Currently, we support CottontailDB or PostgreSQL with pgVector.
We define the database connection at the beginning of the schema:
{
"schemas": {
"sandbox": {
"connection": {
"database": "CottontailConnectionProvider",
"parameters": {
"host": "127.0.0.1",
"port": "1865"
}
}
}
}
}
{
"schemas": {
"sandbox": {
"connection": {
"database": "PgVectorConnectionProvider",
"parameters": {
"host": "127.0.0.1",
"port": "5432",
"database": "postgres",
"username": "postgres",
"password": <password>
}
}
}
}
}
We have two goals: (i) we want to search by (average) colour and (ii), we want to search by filename. Hence, both of this information have to be somehow represented for the system. In vitrivr-engine, such representations are called _descriptor_s, or more generally, features. Features are defined as fields on the schema:
{
"schemas": {
"sandbox": {
"connection": {
"database": "CottontailConnectionProvider",
"parameters": {
"Host": "127.0.0.1",
"port": "1865"
}
},
"fields": {
"averagecolor": {
"factory": "AverageColor"
},
"file": {
"factory": "FileSourceMetadata"
}
}
}
}
}
This defines two fields, averagecolor
and file
on our sandbox
schema.
In order to be able to export thumbnails during extraction (ingestion) and also have access to these thumbnails during query time (retrieval), we configure a disk resolver and a thumbnail exporter:
{
"schemas": {
"sandbox": {
"connection": {
"database": "CottontailConnectionProvider",
"parameters": {
"Host": "127.0.0.1",
"port": "1865"
}
},
"fields": {
"averagecolor": {
"factory": "AverageColor"
},
"file": {
"factory": "FileSourceMetadata"
}
},
"resolvers": {
"disk": {
"factory": "DiskResolver",
"parameters": {
"location": "../sandbox/thumbnails/"
}
}
},
"exporters": {
"thumbnail": {
"factory": "ThumbnailExporter",
"resolverName": "disk",
"parameters": {
"maxSideResolution": "400",
"mimeType": "JPG"
}
}
}
}
}
}
We configure the disk named resolver such that its location is ../sandbox/thumbnails/
(remember: we operate from within the instance
folder).
Additionally, we setup the thumbnail
exporter such that the MIME type will be JPG and the longer side will have 400px
.
While the schema defines the representation, we require an ingestion pipeline which defines how and in which order this representations are computed:
Create a file ingestion-image.json
:
{
"schema": "sandbox",
"context": {
"contentFactory": "InMemoryContentFactory",
"resolverName": "disk",
"local": {
"enumerator": {
"path": "../sandbox/media/",
"depth": "1"
},
"thumbnail": {
"path": "../sandbox/thumbnails/"
},
"filter": {
"type": "SOURCE:IMAGE"
}
}
},
"operators": {
"enumerator": { "type": "ENUMERATOR", "factory": "FileSystemEnumerator", "mediaTypes": ["IMAGE"]},
"decoder": { "type": "DECODER", "factory": "ImageDecoder" },
"avgColor": { "type": "EXTRACTOR", "fieldName": "averagecolor"},
"file_metadata": { "type": "EXTRACTOR", "fieldName": "file" },
"thumbnail": { "type": "EXPORTER", "exporterName": "thumbnail" },
"filter": { "type": "TRANSFORMER", "factory": "TypeFilterTransformer"}
},
"operations": {
"enumerator": { "operator": "enumerator" },
"decoder": { "operator": "decoder", "inputs": [ "enumerator" ] },
"averagecolor": { "operator": "avgColor","inputs": ["decoder"]},
"thumbnail": { "operator": "thumbnail", "inputs": ["decoder"] },
"filter": { "operator": "filter", "inputs": ["averagecolor", "thumbnail"], "merge": "COMBINE" },
"file_metadata": { "operator": "file_metadata", "inputs": ["filter"] }
},
"output": ["file_metadata"]
}
Frist, we define in context.local
corresponding parameters, such as where the media files are (context.local.enumerator.path
) and where to store the thumbnails (context.local.thumbnails.location
).
Second, in operators
we define what operators form the pipeline. Here, the names given are also the names required to be used in the context.local
.
Third, we define the operations, the pipeline. See below.
Fourth, with the output
property, we define after which operation the persistance of the representations happens.
Pipeline The pipeline is defined as follows:
The enumerator
reads the files and sends each IMAGE file to the decoder
, which in turn decodes the image and sends its internal representation
to the thumbnail
and averagecolor
operators. The filter
operator merges the results from both, the thumbnail
and averagecolor
operators, but only further sends those of type SOURCE:IMAGE
to the last operator, the file_metadata
. This is then persisted, as specified with the output
property.
We have built vitrivr-engine, have a running CottontailDB instance and also created our schema.json
and ingestion-image.json
configuration files.
Let's start vitrivr-engine, which will result in the CLI running. We also pass our schema to vitrivr-engine:
./vitrivr-engine-server-0.0.1-SNAPSHOT/bin/vitrivr-engine-server schema.json
Within the CLI, we first initialise the database:
v> sandbox init
We use the command sandbox
to address our defined schema and use the sub command init
in order to initialise and prepare the database.
Then, we start the extraction, which results in the print of something along the line:
Started extraction job with UUID <uuid>
The server has been started as well on the (default) port 7070
, hence we can use the OpenAPI swagger ui to check on the status of the extraction job (replace <uuid>
with your uuid of the extraction):
curl -X 'GET' \
'http://localhost:7070/api/sandbox/index/<uuid>' \
-H 'accept: application/json'
Depending on your sandbox collection and hardware, this might be already done or take a while.
Once the extraction is complete, you can move on to the retrieval part:
For retrieval, we strictly operate with vitrivr-engine's OpenAPI swagger ui. In a real-world scenario, one would likely build an appropriate UI for vitrivr-engine. Currently, there is a vitrivr-engine video supporting branch of vitrivr-ng-min for reference.
For querying, one must be aware of the available representations (descriptors) configured on the schema and effectively extracted.
In this guide, there are two fields, averagecolor
and file
.
The former is a representation of the average colour of the media data as 3-long RGB vector, the latter a table-like strucutre with, among others, the file name as a textual value addressed as path
.
In the case of a vector representation, nearest neighbour search (NNS) is performed.
Head over to the OpenAPI swagger ui and locate the query
endpoint.
Using the swagger UI, try out the query by specifying the schema as sandbox
and paste the following example NNS query for the field averagecolor
:
{
"context": {},
"inputs": {
"color": {
"type": "VECTOR",
"data": [
0.5,
0.5,
0.5
]
}
},
"operations": {
"op_color": {
"type": "RETRIEVER",
"field": "averagecolor",
"input": "color"
}
},
"output": "op_color"
}
This will query vitrivr-engine for the average colour with values [0.5,0.5,0.5]
, which is a medium grey.
To query a sub-field of a structured descriptor, such as the file
field, containing the path
(text) and size
(number) sub-fields, a Boolean query on sub-fields are used.
In the followng example, we search for files with a size larger than 15000
bytes:
{
"context": {},
"inputs": {
"size": {
"type": "NUMERIC",
"value":"1500",
"comparison":">"
}
},
"operations": {
"op1": {
"type": "RETRIEVER",
"field": "file.size",
"input": "size"
}
},
"output": "op1"
}
Found an issue in the wiki? Post it!
Have a question? Ask it
Disclaimer: Please keep in mind, vitrivr and vitrivr-engine are predominantly research prototypes.