vinvl_bert

Overview

vinvl_bert is a vision-language model specifically tailored for Arabic image captioning, inspired by the methodologies outlined in the paper Arabic Image Captioning using Pre-training of Deep Bidirectional Transformers. The model leverages pre-trained Bidirectional Transformers (BiT), integrating visual features from images with textual data to generate precise and contextually relevant captions in Arabic. This approach employs object tags as anchor points to facilitate semantic alignment between image regions and text, making it highly effective for Arabic datasets. The repository is optimized to support various vision-language tasks and offers extensive customization options.

Features

Integration with pre-trained models: Easily use pre-trained models for captioning tasks.
Image and text feature fusion: Incorporates image region features with textual data.
Flexible configurations: Supports various decoding methods and customizations for generation.
Supports constrained beam search (CBS): Enables fine-grained control over output captions.

Installation:

Option 1: Install via `pip`

pip install git+https://github.com/Mahmood-Anaam/vinvl_bert.git --quiet

Option 2: Clone Repository and Install in Editable Mode

git clone https://github.com/Mahmood-Anaam/vinvl_bert.git
cd vinvl_bert
pip install -e .

Option 3: Use Conda Environment

git clone https://github.com/Mahmood-Anaam/vinvl_bert.git
cd vinvl_bert
conda env create -f environment.yml
conda activate vinvl_bert
pip install -e .

Quick Start

Here’s how to get started with vinvl_bert:

import torch
from PIL import Image
import requests
from vinvl_bert.feature_extractors import VinVLFeatureExtractor
from vinvl_bert.pipelines import VinVLBertPipeline
from vinvl_bert.configs import VinVLBertConfig

# Configure settings
cfg = VinVLBertConfig()
cfg.model_id = "jontooy/AraBERT32-Flickr8k"  # model id from huggingface
cfg.device = "cuda" if torch.cuda.is_available() else "cpu"  # Device for computation (GPU/CPU)

# Image and object detection settings
cfg.add_od_labels = True  # Whether to add object detection labels to input
cfg.max_img_seq_length = 50  # Maximum sequence length for image features

# Generation settings
cfg.is_decode = True  # Enable decoding (generation mode)
cfg.do_sample = False  # Whether to use sampling for generation
cfg.max_gen_length = 50  # Maximum length for generated text
cfg.num_beams = 5  # Number of beams for beam search
cfg.temperature = 1.0  # Temperature for sampling (lower values make output more deterministic)
cfg.top_k = 50  # Top-k sampling (0 disables it)
cfg.top_p = 1.0  # Top-p (nucleus) sampling (0 disables it)
cfg.repetition_penalty = 1.0  # Penalty for repeating words (1.0 disables it)
cfg.length_penalty = 1.0  # Penalty for sequence length (used in beam search)
cfg.num_keep_best = 3  # Number of best sequences to keep

# Load an example image
img_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(img_url, stream=True).raw)

# Extract image features
feature_extractor = VinVLFeatureExtractor(device=cfg.device, add_od_labels=cfg.add_od_labels)
image_features = feature_extractor([image])

# Generate a caption
pipeline = VinVLBertPipeline(cfg)
features, captions = pipeline([image])
print("Generated Caption:", captions[0])

Customization

You can fine-tune or modify configurations in VinVLBertConfig to suit specific tasks, such as:

Adjusting sequence lengths for text and images.
Modifying beam search parameters for generation.
Enabling or disabling constrained beam search (CBS) for specific constraints.

Limitations

This repository is a utility for integrating pre-trained models for Arabic image captioning. It is not a full-fledged library for vision-language tasks and assumes familiarity with PyTorch and Transformers.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
notebooks		notebooks
vinvl_bert		vinvl_bert
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vinvl_bert

Overview

Features

Installation:

Option 1: Install via `pip`

Option 2: Clone Repository and Install in Editable Mode

Option 3: Use Conda Environment

Quick Start

Customization

Limitations

About

Releases

Packages

Languages

License

Mahmood-Anaam/vinvl_bert

Folders and files

Latest commit

History

Repository files navigation

vinvl_bert

Overview

Features

Installation:

Option 1: Install via pip

Option 2: Clone Repository and Install in Editable Mode

Option 3: Use Conda Environment

Quick Start

Customization

Limitations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Option 1: Install via `pip`

Packages