adding support for Qwen2.5-VL #197

smdesai · 2025-02-06T21:48:41Z

No description provided.

davidkoski · 2025-02-06T22:36:59Z

Libraries/MLXVLM/Models/Qwen25VL.swift

+    }
+}
+
+public class Qwen25VLProcessor: UserInputProcessor {


Does this end up being the same as the Qwen2VL Processor? If not I wonder if we would want to factor these so that they share code?

See also #173 -- it would be nice to have that work apply to both.

@davidkoski Yes it's the same UserInputProcessor. I'll refactor it and then take a look at #173

davidkoski · 2025-02-07T21:15:13Z

@DePasqualeOrg given this refactoring of the processor is in the same place as your work on using the new template in #173 , what do you think about how to proceed? The two models share the same processor and I think it will be good to have it only in one place. If you are going to land your changes soon, perhaps we should merge that and then do this refactor (I think that will just be copying code around).

Or would it be easier to put your work on top of the processor broken out like this?

@smdesai any thoughts from your side?

DePasqualeOrg · 2025-02-07T23:23:00Z

I think it would be best to merge #173 first.

smdesai · 2025-02-08T04:40:16Z

@davidkoski I don't have a preference and as @DePasqualeOrg mentions above, perhaps it's best to merge #173 first then I can make the appropriate changes here.

DePasqualeOrg · 2025-02-11T20:15:07Z

#173 is ready for review. Thanks @smdesai for your work on supporting Qwen 2.5 VL. I'm looking forward to trying it out!

smdesai · 2025-02-11T21:18:03Z

@DePasqualeOrg Thank you for your work. I applied your changes together with changes on my side and noticed the chat template in Qwen2.5 VL is missing the image_pad, vision_start and vision_end tokens. The template looks like a normal template with function calling. To get it to work, I used the Qwen2 chat template as a string and passed that to applyChatTemplate as shown here:

var promptTokens = try tokenizer.applyChatTemplate(messages: messages, chatTemplate: chatTemplate)

in the call to prepare.

DePasqualeOrg · 2025-02-11T21:45:18Z

Wow. This is not the first time I've been surprised by the chat templates that ship with models. I've opened a discussion on Hugging Face about replacing the current chat template.

DePasqualeOrg · 2025-02-14T10:12:08Z

@smdesai, I made some changes in the image processing before my PR got merged. It's too bad that several people were working on this at the same time, but hopefully we can sort out these last few conflicts.

smdesai · 2025-02-14T19:01:03Z

@DePasqualeOrg I've combined the merged changes to the refactored processor here, the only change being applying the chat template for Qwen2.5-VL which uses a hard coded template vs. one from the config. Once that model config is fixed the change will be removed.

Right now, I'm going through the code to ensure the port is accurate as I'm seeing different results on laptop and device. Once I've gone though the code, I'll push the change which resolves the conflicts.

…n2.5VL as the one in config does not support image/video merged with changes from ml-explore#173 and added chat template for Qwen2.5VL as the one in the config does not support image/video

smdesai · 2025-02-17T17:03:42Z

@davidkoski I believe this is ready for review.

DePasqualeOrg · 2025-02-21T20:57:00Z

@JustinLin610, as you can see in the discussion here and on Hugging Face, the chat template for the Qwen 2.5 VL models doesn't include the vision logic. The fix should be very simple: Just replace the chat template in tokenizer_config.json with the one used by Qwen 2 VL models in the repos on Hugging Face.

DePasqualeOrg · 2025-02-25T14:23:23Z

@Blaizzy, do you want to fix the chat templates in the Qwen 2.5 VL models on Hugging Face by replacing them with the templates used in Qwen 2 VL? See the discussion above for more context. If the Qwen team doesn't want to do it, we can at least fix them for MLX users.

smdesai · 2025-02-25T17:17:09Z

@davidkoski @DePasqualeOrg Thanks for pushing to get the correct template added. As for the final review, I may have been somewhat hasty on that. The windowing function isn't altogether correct as it stands, it's not representative of the python MLX code and I'm not altogether happy with some of the results as it differs from the python MLX code. I'm working to resolve this to achieve the same results or as close to it as possible with python MLX.

DePasqualeOrg · 2025-02-25T17:23:21Z

@smdesai, I thought I solved that already in my solution for Qwen 2 VL. I haven't looked at the Python implementation, but if you change the approach, please make sure that it supports a mix of images and videos in any order, anywhere in a multi-turn chat, which my solution does.

smdesai · 2025-02-25T21:04:03Z

@DePasqualeOrg Thanks for pointing it out. I'll take a closer look at the Qwen 2 VL implementation and compare.

Blaizzy · 2025-02-25T21:57:35Z

@Blaizzy, do you want to fix the chat templates in the Qwen 2.5 VL models on Hugging Face by replacing them with the templates used in Qwen 2 VL? See the discussion above for more context. If the Qwen team doesn't want to do it, we can at least fix them for MLX users.

Fixed all models hosted in the MLX community :)

Blaizzy · 2025-02-25T22:05:53Z

There are zero differences between Qwen2-vl and Qwen2.5-vl instruct chat template.

However, there is a deeper issue with the preprocessor that I just fixed.

DePasqualeOrg · 2025-02-25T22:18:03Z

There are zero differences between Qwen2-vl and Qwen2.5-vl instruct chat template.

@Blaizzy, you can see the differences here:

https://huggingface.co/mlx-community/Qwen2-VL-7B-Instruct-4bit/blob/main/tokenizer_config.json

https://huggingface.co/mlx-community/Qwen2.5-VL-7B-Instruct-4bit/blob/main/tokenizer_config.json

smdesai · 2025-02-25T22:32:57Z

@Blaizzy It looks like you're comparing chat_template.json that's present for both models. Yes these are identical and what @DePasqualeOrg points to is the chat template in config.json which is different and what's being used when the template is applied.

Blaizzy · 2025-02-25T22:41:02Z

@Blaizzy It looks like you're comparing chat_template.json that's present for both models. Yes these are identical and what @DePasqualeOrg points to is the chat template in config.json which is different and what's being used when the template is applied.

Yes, transformers now uses the chat_template.json and not the one in tokenizer_config.json.

It happened late last year.

DePasqualeOrg · 2025-02-25T22:43:06Z

Yes, transformers now uses the chat_template.json and not the one in tokenizer_config.json.

It happened late last year.

@pcuenca, I guess we should use the template from chat_template.json, if present, in swift-transformers, and fall back to the one in tokenizer_config.json if not?

Blaizzy · 2025-02-25T22:43:36Z

According to them tokenizer_config.json shouldn't have chat_template for VLMs.

I know it's complicated.

Blaizzy · 2025-02-25T22:46:11Z

We should use processor's chat template and not the tokenizer one.
That's why we have a chat_template.json file now.
huggingface/transformers#31691

Blaizzy · 2025-02-25T22:53:35Z

@pcuenca, I guess we should use the template from chat_template.json, if present, in swift-transformers, and fall back to the one in tokenizer_config.json if not?

Yap, we could use some improvements to simplify the API here.

Perhaps chat_template.json can be the source of truth for both processor(if present) and tokenizer.

DePasqualeOrg · 2025-02-26T08:29:47Z

I submitted a PR in swift-transformers to use the correct chat template: huggingface/swift-transformers#184

Pull from latest

DePasqualeOrg · 2025-03-03T08:21:06Z

@smdesai, I'm curious to know: What's the difficulty in making my solution work for Qwen 2.5 VL? If Qwen 2 uses the same processing, can't you just factor out my solution and use it for both models? I'm tempted to try it myself but wanted to check with you first.

smdesai · 2025-03-04T06:24:46Z

@DePasqualeOrg First of all thank you for the PR to swift-transformers for the chat template. I've tested by modifying Package.swift with the new version and have removed the temporary change I made to the Qwen25VLProcessor.

The windowing function is different in Qwen25VL, at least from the Python source from which this port is based on. I'll factor your solution for the VisionModel and see how that performs and all going well, I'll push and update to the PR.

I've also run into the reshape issue and it has to do with small images (588x1291) and how they're sized.

provide public accessors for ModelConfigurations from registries (ml-explore#219)

smdesai · 2025-03-05T00:20:49Z

Here's the refactored version, it's identical to #222 by @DePasqualeOrg. The only difference being refactoring of additional code. You can close this out and merge #222 if that looks ok.

davidkoski reviewed Feb 6, 2025

View reviewed changes

support for Qwen2.5-VL

4eefea9

smdesai added 2 commits February 14, 2025 13:29

pulled in changes from ml-explore#173 and added chat template for Qwe…

5871551

…n2.5VL as the one in config does not support image/video merged with changes from ml-explore#173 and added chat template for Qwen2.5VL as the one in the config does not support image/video

remove unused imports

392dee5

smdesai force-pushed the main branch from df6eaae to 392dee5 Compare February 14, 2025 21:34

DePasqualeOrg mentioned this pull request Feb 26, 2025

Prefer chat_template.json for chat template huggingface/swift-transformers#184

Merged

Merge pull request #1 from ml-explore/main

0b24d03

Pull from latest

davidkoski mentioned this pull request Mar 4, 2025

Add Qwen 2.5 VL #222

Open

smdesai and others added 3 commits March 4, 2025 15:16

Merge pull request #2 from ml-explore/main

ec15b8b

provide public accessors for ModelConfigurations from registries (ml-explore#219)

recfactor common code between Qwen2VL and Qwen25VL

270bb8b

apply correct windowing fixes from @DePasqualeOrg

3072ad0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding support for Qwen2.5-VL #197

adding support for Qwen2.5-VL #197

smdesai commented Feb 6, 2025

davidkoski Feb 6, 2025

smdesai Feb 7, 2025

davidkoski commented Feb 7, 2025

DePasqualeOrg commented Feb 7, 2025

smdesai commented Feb 8, 2025

DePasqualeOrg commented Feb 11, 2025

smdesai commented Feb 11, 2025 •

edited

Loading

DePasqualeOrg commented Feb 11, 2025

DePasqualeOrg commented Feb 14, 2025

smdesai commented Feb 14, 2025

smdesai commented Feb 17, 2025

DePasqualeOrg commented Feb 21, 2025

DePasqualeOrg commented Feb 25, 2025

smdesai commented Feb 25, 2025

DePasqualeOrg commented Feb 25, 2025 •

edited

Loading

smdesai commented Feb 25, 2025

Blaizzy commented Feb 25, 2025

Blaizzy commented Feb 25, 2025

DePasqualeOrg commented Feb 25, 2025

smdesai commented Feb 25, 2025

Blaizzy commented Feb 25, 2025

DePasqualeOrg commented Feb 25, 2025

Blaizzy commented Feb 25, 2025

Blaizzy commented Feb 25, 2025 •

edited

Loading

Blaizzy commented Feb 25, 2025

DePasqualeOrg commented Feb 26, 2025

DePasqualeOrg commented Mar 3, 2025

smdesai commented Mar 4, 2025 •

edited

Loading

smdesai commented Mar 5, 2025

adding support for Qwen2.5-VL #197

Are you sure you want to change the base?

adding support for Qwen2.5-VL #197

Conversation

smdesai commented Feb 6, 2025

davidkoski Feb 6, 2025

Choose a reason for hiding this comment

smdesai Feb 7, 2025

Choose a reason for hiding this comment

davidkoski commented Feb 7, 2025

DePasqualeOrg commented Feb 7, 2025

smdesai commented Feb 8, 2025

DePasqualeOrg commented Feb 11, 2025

smdesai commented Feb 11, 2025 • edited Loading

DePasqualeOrg commented Feb 11, 2025

DePasqualeOrg commented Feb 14, 2025

smdesai commented Feb 14, 2025

smdesai commented Feb 17, 2025

DePasqualeOrg commented Feb 21, 2025

DePasqualeOrg commented Feb 25, 2025

smdesai commented Feb 25, 2025

DePasqualeOrg commented Feb 25, 2025 • edited Loading

smdesai commented Feb 25, 2025

Blaizzy commented Feb 25, 2025

Blaizzy commented Feb 25, 2025

DePasqualeOrg commented Feb 25, 2025

smdesai commented Feb 25, 2025

Blaizzy commented Feb 25, 2025

DePasqualeOrg commented Feb 25, 2025

Blaizzy commented Feb 25, 2025

Blaizzy commented Feb 25, 2025 • edited Loading

Blaizzy commented Feb 25, 2025

DePasqualeOrg commented Feb 26, 2025

DePasqualeOrg commented Mar 3, 2025

smdesai commented Mar 4, 2025 • edited Loading

smdesai commented Mar 5, 2025

smdesai commented Feb 11, 2025 •

edited

Loading

DePasqualeOrg commented Feb 25, 2025 •

edited

Loading

Blaizzy commented Feb 25, 2025 •

edited

Loading

smdesai commented Mar 4, 2025 •

edited

Loading