[24/07/05] 🤗FacaCaption-15M OpenFace-CQUPT/FaceCaption-15M
[25/01/11] 🤗FaceCaptionHQ-4M OpenFace-CQUPT/FaceCaptionHQ-4M
[24/09/12] 🤗HumanCaption-10M OpenFace-CQUPT/HumanCaption-10M
[24/10/23] 🤗HumanCaption-HQ OpenFace-CQUPT/HumanCaption-HQ-311K
FaceCaption-15M, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image caption dataset to date.
[24/09/01] The embeddings of images in FaceCaption-15M has been released! OpenFace-CQUPT/Facecaption-15M-Embeddings
FaceCaptionHQ-4M contains about 4M facial image-text pairs that cleaned from FaceCaption-15M .
HumanCaption-10M: a large, diverse, high-quality dataset of human-related images with natural language descriptions (image to text). The dataset is designed to facilitate research on human-centered tasks. HumanCaption-10M contains approximately 10 million human-related images and their corresponding facial features in natural language descriptions and is the second generation version of FaceCaption-15M
Approximately 311,000 human-related images and their corresponding natural language descriptions. Compared to HumanCaption-10M, this dataset not only includes associated facial language descriptions but also filters out images with higher resolution and employs the powerful visual understanding capabilities of GPT-4V to generate more detailed and accurate text descriptions. This dataset is used for the second phase of training HumanVLM, enhancing the model's capabilities in caption generation and visual understanding.