Skip to content

ddw2AIGROUP2CQUPT/Large-Scale-Multimodal-Face-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

Large-Scale Multimodal Face Datasets

[24/07/05] 🤗FacaCaption-15M OpenFace-CQUPT/FaceCaption-15M

[25/01/11] 🤗FaceCaptionHQ-4M OpenFace-CQUPT/FaceCaptionHQ-4M

[24/09/12] 🤗HumanCaption-10M OpenFace-CQUPT/HumanCaption-10M

[24/10/23] 🤗HumanCaption-HQ OpenFace-CQUPT/HumanCaption-HQ-311K

FacaCaption-15M

image/png

FaceCaption-15M, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image caption dataset to date.
[24/09/01] The embeddings of images in FaceCaption-15M has been released! OpenFace-CQUPT/Facecaption-15M-Embeddings

FaceCaptionHQ-4M

image/png

FaceCaptionHQ-4M contains about 4M facial image-text pairs that cleaned from FaceCaption-15M .

HumanCaption-10M

image/png

HumanCaption-10M: a large, diverse, high-quality dataset of human-related images with natural language descriptions (image to text). The dataset is designed to facilitate research on human-centered tasks. HumanCaption-10M contains approximately 10 million human-related images and their corresponding facial features in natural language descriptions and is the second generation version of FaceCaption-15M

HumanCaption-HQ

image/png

Approximately 311,000 human-related images and their corresponding natural language descriptions. Compared to HumanCaption-10M, this dataset not only includes associated facial language descriptions but also filters out images with higher resolution and employs the powerful visual understanding capabilities of GPT-4V to generate more detailed and accurate text descriptions. This dataset is used for the second phase of training HumanVLM, enhancing the model's capabilities in caption generation and visual understanding.

Releases

No releases published

Packages

No packages published