FineTuning BLIP2 - various issues #376

iliasmiraoui · 2023-04-27T21:58:57Z

Hello,

Thank you again for the fantastic work on this library and all the examples you are including !!
Big up @younesbelkada for all the support as well...

I have been trying to play around with BLIP2 and PEFT using the example notebook (https://colab.research.google.com/drive/16XbIysCzgpAld7Kd9-xz-23VPWmqdWmW?usp=sharing#scrollTo=6cCVhsmJxxjH) and a few things came up and I was hoping to get your help:

When trying to finetune with "salesforce/blip2-flan-t5-xl", I got a ton of issues:

    config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    target_modules=["q_proj", "k_proj"])

The q_proj and k_proj layers don't exist and so I used "q","v" or tried to use just the default values and it made the loss converge to 0 extremely quickly. However, the model was really just outputting gibberish so I'm likely not using the right target_modules... How are you supposed to tweak this parameter? In general too, is there a heuristic for these such as T5 -> q,v , OPT -> q_proj,k_proj and is that different for the regular model vs BLIP2?

I tried using a bigger OPT (i.e. "ybelkada/blip2-opt-2.7b-fp16-sharded" or "ybelkada/blip2-opt-2.7b-fp16-sharded") and that just made the loss train with "nan" all the time regardless of what I tried.

Something seemed really odd in the training loop, specifically: outputs = model(input_ids=input_ids, pixel_values=pixel_values, labels=input_ids)

From my understanding, this would imply that we are already passing the label into the model that we want to predict as an input?
I also tried to modify the notebook to go beyond just image captioning and try to train a VQA model by modifying the following:

class ImageCaptioningDataset(Dataset):
    def __init__(self, dataset, processor):
        self.dataset = dataset
        self.processor = processor

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        item = self.dataset[idx]
        encoding = self.processor(images=item["image"],text=item['prompt'], padding="max_length", return_tensors="pt")
        # remove batch dimension
        encoding = {k: v.squeeze() for k, v in encoding.items()}
        encoding["text"] = item["text"]
        return encoding

def collate_fn(batch):
    # pad the input_ids and attention_mask
    processed_batch = {}
    for key in batch[0].keys():
        if key in ["pixel_values",'input_ids']:
            processed_batch[key] = torch.stack([example[key] for example in batch])
        elif key == 'text':
            text_inputs = processor.tokenizer(
                [example["text"] for example in batch], padding=True, return_tensors="pt"
            )
            processed_batch["input_ids_label"] = text_inputs["input_ids"]
            processed_batch["attention_mask_label"] = text_inputs["attention_mask"]
    return processed_batch

    input_ids = batch.pop("input_ids").to(device)
    input_ids_label = batch.pop("input_ids_label").to(device)
    pixel_values = batch.pop("pixel_values").to(device, torch.float16)

    outputs = model(input_ids=input_ids,
                    pixel_values=pixel_values,
                    labels=input_ids_label)

But then it didn't really seem to converge as well as the regular image captioning despite always having the same prompt throughout my dataset... Anything I could be doing wrong?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

betterftr · 2023-05-25T19:36:58Z

I have tried messing around with blip 2 t5 xxl with same settings for LoraConfig (blip opt 6.7 was working fine) it outputs jibberish and converges to 0 waaay to quickly

betterftr · 2023-06-02T20:55:10Z

Figured it out, the T5 model expects input_ids as instructions, and labels (decoder_input_ids) as your captions

github-actions · 2023-06-28T15:03:40Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

bryanchiaws · 2023-09-22T18:35:38Z

Figured it out, the T5 model expects input_ids as instructions, and labels (decoder_input_ids) as your captions

I am getting the following error (only when I use peft):

TypeError: forward() got an unexpected keyword argument 'inputs_embeds'

I was wondering if you knew what might be the issue?

Or do you have an example notebook I could look at?

z3ugma · 2023-12-04T22:11:25Z

I'm also getting the error that the loss ends up being all nan after an epoch or two of training , I documented it in huggingface/notebooks#454

NielsRogge · 2023-12-22T10:09:10Z

pinging @younesbelkada here

z3ugma · 2023-12-24T04:37:39Z

Still an issue for me after trying various versions of PEFT and PyTorch. A currently non-working system setup:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
Torch 2.0.1+cu117
Datasets 2.16.0
Python 3.11.7 | packaged by conda-forge | (main, Dec 15 2023, 08:38:37) [GCC 12.3.0]
PEFT 0.5.0

pribadihcr · 2024-03-06T02:15:30Z

Hi @z3ugma , Have you any solution now?

ChristopheYe · 2024-11-05T20:45:32Z

Hi @bryanchiaws ,
I have the same error, did you figure out a way to fix that ?
Thanks !

github-actions bot closed this as completed Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FineTuning BLIP2 - various issues #376

FineTuning BLIP2 - various issues #376

iliasmiraoui commented Apr 27, 2023 •

edited

Loading

betterftr commented May 25, 2023

betterftr commented Jun 2, 2023

github-actions bot commented Jun 28, 2023

bryanchiaws commented Sep 22, 2023

z3ugma commented Dec 4, 2023 •

edited

Loading

NielsRogge commented Dec 22, 2023

z3ugma commented Dec 24, 2023

pribadihcr commented Mar 6, 2024

ChristopheYe commented Nov 5, 2024

FineTuning BLIP2 - various issues #376

FineTuning BLIP2 - various issues #376

Comments

iliasmiraoui commented Apr 27, 2023 • edited Loading

betterftr commented May 25, 2023

betterftr commented Jun 2, 2023

github-actions bot commented Jun 28, 2023

bryanchiaws commented Sep 22, 2023

z3ugma commented Dec 4, 2023 • edited Loading

NielsRogge commented Dec 22, 2023

z3ugma commented Dec 24, 2023

pribadihcr commented Mar 6, 2024

ChristopheYe commented Nov 5, 2024

iliasmiraoui commented Apr 27, 2023 •

edited

Loading

z3ugma commented Dec 4, 2023 •

edited

Loading