This is a tool to allow for easy download of the videos forming the QuerYD dataset.
Installing necessary libraries:
- argparse
- pytube
python -m pip install git+https://github.com/nficano/pytube
- pathlib
- logging
- multiprocessing
- requests
- tqdm
- json
- zsvision
Downloading videos
To test the download videos script for the QuerYD dataset simply run:
python download_queryd.py --txt_file relevant-video-links-test.txt --task download_videos
This will create a folder called videos in your current folder and videos will be saved there. To fully run the download_videos script run:
python download_queryd.py --txt_file relevant-video-links-{version either v1 or v2}.txt --task download_videos
To only download videos with non-english descriptions run the download_videos script run:
python download_queryd.py --txt_file relevant-non-en-links.txt --task download_videos
To attempt downloading videos multiple times, set the --tries flag to the desired value. By default the value is 2. Eg:
python download_queryd.py --txt_file relevant-video-links-{version either v1 or v2}.txt --tries 3 --task download_videos
To re-download all files use the --refresh flag. Eg:
python download_queryd.py --txt_file relevant-video-links-{version either v1 or v2}.txt --refresh --task download_videos
Downloading json metadata
To download the .json file containing information about the described videos run:
wget http://www.robots.ox.ac.uk/~vgg/research/collaborative-experts/data/QuerYD/json_metadata-{version either v1 or v2}.zip
mv json_metadata-{version either v1 or v2}.zip json_metadata.zip
unzip json_metadata.zip
Downloading audio description files
Audio files can be downloaded only after downloading the .json metadata using the previous step.
To download the audio description files corresponding to each video, run:
python download_queryd.py --txt_file relevant-video-links-{version either v1 or v2}.txt --task download_wavs
To download only the non-english audio descriptions run:
python download_queryd.py --txt_file relevant-non-en-links.txt --task download_wavs
To use more processes add the --processes flag with the number of CPUs available. eg:
python download_queryd.py --txt_file relevant-video-links-{version either v1 or v2}.txt --task download_wavs --processes 2
Downloading transcribed descriptions and corresponding time-stamps
The transcribed version of the audio descriptions can be downloaded as a pickle file by accessing the following link:
http://www.robots.ox.ac.uk/~vgg/research/collaborative-experts/data/QuerYD/raw_captions_combined_filtered-{version either v1 or v2}.pkl
mv raw_captions_combined_filtered-{version either v1 or v2}.pkl raw_captions_combined_filtered.pkl
The corresponding time-stamps in the same order are provided in this pickle file:
http://www.robots.ox.ac.uk/~vgg/research/collaborative-experts/data/QuerYD/times_captions_combined_filtered-{version either v1 or v2}.pkl
mv times_captions_combined_filtered-{version either v1 or v2}.pkl times_captions_combined_filtered.pkl
The confidence of the transcriptions in the same order as transcriptions are found here:
http://www.robots.ox.ac.uk/~vgg/research/collaborative-experts/data/QuerYD/confidence_captions_combined_filtered-{version either v1 or v2}.pkl
mv confidence_captions_combined_filtered-{version either v1 or v2}.pkl confidence_captions_combined_filtered.pkl
Downloading video features, descriptions and train/val/test splits
To download QuerYD data:
wget http://www.robots.ox.ac.uk/~vgg/research/collaborative-experts/data/features-v2/QuerYD-experts-{version either v1 or v2}.tar.gz
mv QuerYD-experts-{version either v1 or v2}.tar.gz QuerYD-experts.tar.gz
To download QuerYDSegments data (localised clips and their descriptions):
wget http://www.robots.ox.ac.uk/~vgg/research/collaborative-experts/data/features-v2/QuerYDSegments-experts-{version either v1 or v2}.tar.gz
mv QuerYDSegments-experts-{version either v1 or v2}.tar.gz QuerYDSegments-experts.tar.gz
More info and scripts used can be found at https://github.com/albanie/collaborative-experts#queryd and training and test steps can be followed from https://github.com/albanie/collaborative-experts#evaluating-a-pretrained-model where MSVD should be replaced by QuerYD or QuerYDSegments. Model names should be taken from retrieval results tables at https://github.com/albanie/collaborative-experts#queryd or https://github.com/albanie/collaborative-experts#querydsegments .
[1] If you find this code useful, please consider citing:
@misc{oncescu2021queryd,
title={QuerYD: A video dataset with high-quality text and audio narrations},
author={Andreea-Maria Oncescu and João F. Henriques and Yang Liu and Andrew Zisserman and Samuel Albanie},
year={2021},
eprint={2011.11071},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
[2] If you find this code useful or use the extracted features, please consider citing:
@inproceedings{Liu2019a,
author = {Liu, Y. and Albanie, S. and Nagrani, A. and Zisserman, A.},
booktitle = {arXiv preprint arxiv:1907.13487},
title = {Use What You Have: Video retrieval using representations from collaborative experts},
date = {2019},
}
This work is supported by the EP-SRC (VisualAI EP/T028572/1 and DTA Studentship), and the Royal Academy of Engineering (DFR05420). We are gratefulto Sophia Koepke for her helpful comments and suggestions.
Importance of the model:
Model | Task | R@1 | R@5 | R@10 | R@50 | MdR | MnR | Geom | params | Links |
---|---|---|---|---|---|---|---|---|---|---|
HowTo100m S3D | t2v | 10.2(0.0) | 24.5(0.0) | 32.7(0.0) | 54.3(0.0) | 38.0(0.0) | 82.1(0.0) | 20.2(0.0) | 1 | config, model, log |
CE - P,CG | t2v | 29.8(0.3) | 63.8(0.5) | 74.9(0.3) | 93.0(0.1) | 3.0(0.0) | 15.1(0.4) | 52.3(0.3) | 57.75M | config, model, log |
CE | t2v | 31.9(1.5) | 64.5(1.4) | 76.1(0.8) | 93.8(0.9) | 3.0(0.0) | 13.1(0.8) | 53.9(0.7) | 30.82M | config, model, log |
HowTo100m S3D | v2t | 10.0(0.0) | 25.7(0.0) | 32.3(0.0) | 53.2(0.0) | 42.0(0.0) | 81.7(0.0) | 20.2(0.0) | 1 | config, model, log |
CE - P,CG | v2t | 28.6(1.1) | 62.4(0.5) | 73.6(0.8) | 92.9(0.1) | 3.0(0.0) | 14.7(0.4) | 50.8(0.7) | 57.75M | config, model, log |
CE | v2t | 32.9(1.7) | 64.9(1.1) | 76.7(1.1) | 93.6(0.6) | 3.0(0.0) | 12.8(0.6) | 54.7(1.1) | 30.82M | config, model, log |
The influence of different pretrained experts for the performance of the CE model trained on QuerYD is studied. The value and cumulative effect of different experts for scene clas-sification (SCENE), ambient sound classification (AUDIO),image classification (OBJECT), and action recognition (ACTION) are presented. PREV. denotes the experts used in the previous row.
Experts | Task | R@1 | R@5 | R@10 | R@50 | MdR | MnR | Geom | params | Links |
---|---|---|---|---|---|---|---|---|---|---|
Scene | t2v | 17.0(0.7) | 47.0(2.4) | 60.8(1.1) | 85.4(1.6) | 6.3(0.6) | 27.2(1.1) | 36.5(1.0) | 7.51M | config, model, log |
Prev. + Audio | t2v | 21.4(0.2) | 53.0(1.3) | 63.9(0.4) | 88.6(0.3) | 5.0(0.0) | 22.2(0.7) | 41.7(0.4) | 17.25M | config, model, log |
Prev. + Inst | t2v | 32.3(1.6) | 65.5(1.0) | 76.7(0.9) | 93.6(0.2) | 3.0(0.0) | 13.0(0.3) | 54.5(0.3) | 24.63M | config, model, log |
Prev. + R2P1D | t2v | 31.9(1.5) | 64.2(1.4) | 76.1(0.7) | 93.8(0.9) | 3.0(0.0) | 13.1(0.8) | 53.8(0.7) | 30.82M | config, model, log |
Scene | v2t | 20.3(0.5) | 47.4(0.8) | 60.0(0.4) | 85.5(1.6) | 6.0(0.0) | 27.0(0.7) | 38.7(0.3) | 7.51M | config, model, log |
Prev. + Audio | v2t | 23.6(0.9) | 52.2(1.1) | 63.9(1.3) | 89.2(0.3) | 5.0(0.0) | 21.6(0.8) | 42.8(0.5) | 17.25M | config, model, log |
Prev. + Inst. | v2t | 32.6(1.3) | 65.6(0.3) | 77.2(0.3) | 93.7(0.9) | 3.0(0.0) | 12.5(0.1) | 54.8(0.6) | 24.63M | config, model, log |
Prev. + R2P1D | v2t | 32.9(1.7) | 65.0(1.0) | 76.7(1.0) | 93.6(0.6) | 3.0(0.0) | 12.8(0.6) | 54.7(1.1) | 30.82M | config, model, log |
Importance of the model:
Model | Task | R@1 | R@5 | R@10 | R@50 | MdR | MnR | Geom | params | Links |
---|---|---|---|---|---|---|---|---|---|---|
HowTo100m S3D | t2v | 6.4(0.0) | 13.8(0.0) | 19.9(0.0) | 36.3(0.0) | 131.0(0.0) | 340.0(0.0) | 12.1(0.0) | 1 | config, model, log |
CE - P,CG | t2v | 21.9(0.5) | 44.5(0.9) | 53.5(0.0) | 72.0(0.8) | 8.3(0.6) | 107.7(2.6) | 37.4(0.4) | 57.75M | config, model, log |
CE | t2v | 19.2(0.1) | 40.8(1.6) | 49.4(1.0) | 68.7(0.5) | 11.0(1.0) | 125.0(4.7) | 33.8(0.6) | 30.82M | config, model, log |
HowTo100m S3D | v2t | 7.2(0.0) | 15.1(0.0) | 19.5(0.0) | 34.3(0.0) | 160.0(0.0) | 361.4(0.0) | 12.9(0.0) | 1 | config, model, log |
CE - P,CG | v2t | 20.8(0.7) | 43.8(1.2) | 53.2(0.8) | 72.6(1.1) | 8.3(0.6) | 102.6(2.9) | 36.5(0.7) | 57.75M | config, model, log |
CE | v2t | 18.5(0.5) | 40.1(0.6) | 49.5(0.2) | 69.0(0.6) | 11.0(0.0) | 112.1(4.3) | 33.2(0.4) | 30.82M | config, model, log |