Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLX error: [reshape] Cannot reshape array of size 481656 into shape (1,2,3,16,2,14,7,2,14) #200

Open
cyrilzakka opened this issue Feb 13, 2025 · 9 comments

Comments

@cyrilzakka
Copy link

Hi there.

Love this library so far. However it seems like the default VLM example app crashes when selecting a photo from the photo gallery instead of the default image of the flower and the bee. The following error is returned:

MLX error: [reshape] Cannot reshape array of size 481656 into shape (1,2,3,16,2,14,7,2,14). at /Users/cyril/Library/Developer/Xcode/DerivedData/HuggingSnap-ednrujnmdusiyxdproxrvmnwmrcm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx-c/mlx/c/ops.cpp:2337

This is reproducible when using an image taken with the device camera.

Will look into it and submit a pull request as needed.

@davidkoski
Copy link
Collaborator

Any progress on this? I haven't been able to reproduce it myself, but maybe we have different model phones. What resolution image are you seeing this with?

@deet
Copy link

deet commented Mar 7, 2025

I have reproduced this on commit 102aa97a103a10db7b897665993dbd974332d067 from the PR #222

On MacOS 15.2 (24C101)

./mlx-run llm-tool --model mlx-community/Qwen2-VL-2B-Instruct-4bit --prompt "describe this image" --image "/Users/keith/Downloads/mlx_reshape_error.png"
--- xcodebuild: WARNING: Using the first of multiple matching destinations:
{ platform:macOS, arch:arm64, id:00006041-0014514E3460801C, name:My Mac }
{ platform:macOS, arch:x86_64, id:00006041-0014514E3460801C, name:My Mac }
{ platform:macOS, name:Any Mac }
Model loaded -> id("mlx-community/Qwen2-VL-2B-Instruct-4bit")
Starting generation ...
["content": [["type": "text", "text": "describe this image"], ["type": "image"]], "role": "user"] MLX error: [reshape] Cannot reshape array of size 218124 into shape (1,2,3,6,2,14,8,2,14). at /Users/keith/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ehyreqqtqlhzvkggtqdlwrbbmdbl/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx-c/mlx/c/ops.cpp:2337

Using this image:

Image

(Data URL version of image here in case Github modifies the image file: https://gist.githubusercontent.com/deet/3e0ba4f8a2e21a0545067faf3ff14714/raw/3300f82904607c12425c513371430372a0924b59/gistfile1.txt )

@davidkoski
Copy link
Collaborator

I wonder if it is related to this item @DePasqualeOrg found: #222 (comment)

@davidkoski
Copy link
Collaborator

Oh, you mentioned you were using that commit :-) OK, we have a repro I can track it down.

@davidkoski
Copy link
Collaborator

OK, so with the commit from #222:

let scale = min(size.width / extent.width, size.height / extent.height)

we have:

# the desired size from the config
(lldb) po config.size
▿ Size
  - maxPixels : 12845056
  - minPixels : 3136

# the computed target size given the input
(lldb) p size
(CGSize) (width = 1540, height = 1120)

# the input image
(lldb) po image.extent
▿ (0.0, 0.0, 1536.0, 1106.0)
  ▿ origin : (0.0, 0.0)
    - x : 0.0
    - y : 0.0
  ▿ size : (1536.0, 1106.0)
    - width : 1536.0
    - height : 1106.0

# aspect ratios
(lldb) p inputAspectRatio
(CoreFoundation.CGFloat) 1.3887884267631103
(lldb) p desiredAspectRatio
(CoreFoundation.CGFloat) 1.375

#
(lldb) p scale
(CoreFoundation.CGFloat) 1.0026041666666667

That produces this image after the crop:

<CIImage: 0x600001a07f20 extent [0 0 1529 1111]>

which isn't quite what we want. Ideally we would want the original image scaled into an image at least as large as the target and cropped to exactly the target.

@davidkoski
Copy link
Collaborator

We can use a unit test for this:

    func testResize() {
        // resampleBicubic should produce an image with the desired dimensions
        let inputFilter = CIFilter(name: "CIConstantColorGenerator")!
        inputFilter.setValue(CIColor.red, forKey: "inputColor")
        let input = inputFilter.outputImage!.cropped(to: CGRect(x: 0, y: 0, width: 1536, height: 1106))
        
        let target = CGSize(width: 1540, height: 1120)
        let output = MediaProcessing.resampleBicubic(input, to: target)
        
        XCTAssertEqual(output.extent.size, target)
    }

@davidkoski
Copy link
Collaborator

If we revert the change from #222 (regarding size) we get exactly what I think we should get:

(lldb) po rescaled
<CIImage: 0x6000006351c0 extent [-3 -3 1546 1126]>

The extent is larger because the bicubic has extent 3 pixels out on each edge -- the is the exact size we were looking for.

So I think we need to go back to the image that triggered the change @DePasqualeOrg you mentioned a portrait image -- what were the dimensions?

However, I noticed that I'm getting maximum buffer length crashes when a photo or video is in portrait orientation, so we'll need to make sure they're getting scaled down to an appropriate size also when the width is less than the height.

@DePasqualeOrg
Copy link
Contributor

So I think we need to go back to the image that triggered the change @DePasqualeOrg you mentioned a portrait image -- what were the dimensions?

I think previously it would crash when the width was less than the height, because the image wasn't being resized. I fixed this in the latest commit that's being tested here, but now I've also replicated the crash with a PNG screenshot.

@DePasqualeOrg
Copy link
Contributor

DePasqualeOrg commented Mar 8, 2025

I made some more changes in #222 that seem to fix the issue for Qwen 2 VL and Qwen 2.5 VL.

For some reason I could only get it to work with Lanczos resampling and not with bicubic interpolation. Maybe someone smarter than me can get it working with bicubic, or we can rename the method to resampleLanczos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants