Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in FineTuning deepseek-vl-7b-chat-8bit #187

Open
sachinraja13 opened this issue Jan 27, 2025 · 2 comments
Open

Error in FineTuning deepseek-vl-7b-chat-8bit #187

sachinraja13 opened this issue Jan 27, 2025 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@sachinraja13
Copy link

This is the command I'm using:

python -m mlx_vlm.lora --dataset ~/Datasets/BusinessVQA/fintabnet/val/vqa_dataset.hf --model-path ~/.cache/lm-studio/models/mlx-community/deepseek-vl-7b-chat-8bit --epochs 2 --batch-size 4 --learning-rate 5e-5

Here is the console output:

INFO:__main__:Loading model from /Users/sachinraja/.cache/lm-studio/models/mlx-community/deepseek-vl-7b-chat-8bit
INFO:__main__:Loading dataset from /Users/sachinraja/Datasets/BusinessVQA/fintabnet/val/vqa_dataset.hf
INFO:__main__:Applying chat template to the dataset
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 240574/240574 [00:07<00:00, 31075.60 examples/s]
INFO:__main__:Setting up LoRA
#trainable params: 23.424 M || all params: 6910.365696 M || trainable%: 0.339%
INFO:__main__:Setting up optimizer
INFO:__main__:Setting up trainer
INFO:__main__:Training model
  0%|                                                                                                                                                                                                                          | 0/60143 [00:10<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/sachinraja/Code/mlx-vlm/mlx_vlm/lora.py", line 177, in <module>
    main(args)
  File "/Users/sachinraja/Code/mlx-vlm/mlx_vlm/lora.py", line 97, in main
    loss = trainer.train_step(
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/sachinraja/Code/mlx-vlm/mlx_vlm/trainer/trainer.py", line 265, in train_step
    loss, grads = loss_and_grad_fn(self.model, batch)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlx/lib/python3.11/site-packages/mlx/nn/utils.py", line 35, in wrapped_value_grad_fn
    value, grad = value_grad_fn(model.trainable_parameters(), *args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlx/lib/python3.11/site-packages/mlx/nn/utils.py", line 29, in inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/sachinraja/Code/mlx-vlm/mlx_vlm/trainer/trainer.py", line 251, in loss_fn
    nn.losses.cross_entropy(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/mlx/lib/python3.11/site-packages/mlx/nn/losses.py", line 81, in cross_entropy
    raise ValueError(
ValueError: Targets shape (4, 78) does not match logits shape (1, 78, 102400).

@Blaizzy : Will greatly appreciate your help here please.

@Blaizzy
Copy link
Owner

Blaizzy commented Feb 24, 2025

Hey @sachinraja13

Please set batch size to 1.

There is a bug with batch size bigger than one for some models.

@Blaizzy Blaizzy added the bug Something isn't working label Feb 24, 2025
@Blaizzy Blaizzy self-assigned this Feb 24, 2025
@sachinraja13
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants