Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models fail to load with keyNotFound error #218

Open
atdrendel opened this issue Feb 28, 2025 · 4 comments
Open

Models fail to load with keyNotFound error #218

atdrendel opened this issue Feb 28, 2025 · 4 comments
Assignees

Comments

@atdrendel
Copy link

Related to #214, I believe.

Some models fail with keyNotFound errors. The Qwen1.5 and 2.5 errors were solved in #210. Some errors still remain, though.

  • Phi-3.5-MoE-instruct-4bit: keyNotFound(base: "SuScaledRotaryEmbedding", key: "_freqs")
  • OpenELM-270M-Instruct: keyNotFound(base: "Linear", key: "weight")
@davidkoski davidkoski self-assigned this Feb 28, 2025
@davidkoski
Copy link
Collaborator

I will take a look -- probably a variation on the same issue.

davidkoski added a commit to davidkoski/mlx-swift that referenced this issue Mar 7, 2025
- do not fail parameter update validation for "invalid" keys (e.g. _freqs)
@davidkoski
Copy link
Collaborator

Phi-3.5-MoE-instruct-4bit is interesting -- it has a rope layer which is SuScaledRotaryEmbedding, but it is not meant to be parameters. Indeed it is not marked with @ParameterInfo, it is just a constant value.

po self.rope.parameters()
▿ [
  
]
  - contents : 0 elements

and that is because:

    public func parameters() -> ModuleParameters {
        filterMap(filter: Self.filterValidParameters, map: Self.mapParameters())
    }

filters out parameters with leading _ names. I think we should probably not consider those for validation.

That will be fixed with ml-explore/mlx-swift#200

@davidkoski
Copy link
Collaborator

OpenELM-270M-Instruct has a couple of cases where layers should be optional but are not:

        var out = transformer(inputs, cache: cache)
        if shareInputOutputLayers {
            out = matmul(out, transformer.embedTokens.weight.T)
        } else {
            out = lmHead(out)
        }

lmHead is always present but ignored depending on this config value. Should be:

        var out = transformer(inputs, cache: cache)
        if let lmHead {
            out = lmHead(out)
        } else {
            out = matmul(out, transformer.embedTokens.weight.T)
        }

there are some other layers in the model set up the same way.

davidkoski added a commit that referenced this issue Mar 7, 2025
- OpenELM had optional layers that were always created
- see #214
@davidkoski
Copy link
Collaborator

I inspected the rest of the models for similar patterns but didn't see any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants