Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 not found #824

Open
1 of 2 tasks
AidanFRyan opened this issue Jan 19, 2025 · 0 comments
Open
1 of 2 tasks

meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 not found #824

AidanFRyan opened this issue Jan 19, 2025 · 0 comments

Comments

@AidanFRyan
Copy link

AidanFRyan commented Jan 19, 2025

System Info

Running 3.2-3B-Instruct-qlora-int4-eo8 in meta-reference-gpu instance via conda (Python 3.10 on Ubuntu 22.04)

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

Error:
ValueError: Model 'meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8' is not available and no llama_model was specified in metadata. Please specify a llama_model in metadata or use a supported model identifier

Cause:
providers/inline/inference/meta_reference/inference.py:80
Parameters in wrong order to build_model_alias, sets 3.2-3B-instruct-qlora-int4-eo8 to try to match against hugging face repo for plain 3.2-3B-Instruct. Reversing order of these parameters fixes this issue.

Error logs

Traceback (most recent call last):
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 354, in
main()
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 286, in main
impls = asyncio.run(construct_stack(config))
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/stack.py", line 191, in construct_stack
await register_resources(run_config, impls)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/stack.py", line 96, in register_resources
await method(**obj.model_dump())
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper
result = await method(self, *args, **kwargs)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/routers/routing_tables.py", line 236, in register_model
registered_model = await self.register_object(model)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/routers/routing_tables.py", line 181, in register_object
registered_obj = await register_object_with_provider(obj, p)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/routers/routing_tables.py", line 37, in register_object_with_provider
return await p.register_model(obj)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper
result = await method(self, *args, **kwargs)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/inference.py", line 115, in register_model
model = await self.model_registry_helper.register_model(model)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/providers/utils/inference/model_registry.py", line 100, in register_model
raise ValueError(
ValueError: Model 'meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8' is not available and no llama_model was specified in metadata. Please specify a llama_model in metadata or use a supported model identifier

Expected behavior

get_hugging_face_repo should receive "Llama3.2-3B-Instruct:int4-qlora-eo8" as model_descriptor, not just "Llama3.2-3B-Instruct" as is passed in current config.

build_model_alias receives the following parameters: provider_model_id and model_descriptor. The call to build_model_alias in inference.py (line 79) should pass model.core_model_id.value first and model.descriptor() second, not the opposite order they are currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant