meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 not found #824

AidanFRyan · 2025-01-19T01:06:13Z

System Info

Running 3.2-3B-Instruct-qlora-int4-eo8 in meta-reference-gpu instance via conda (Python 3.10 on Ubuntu 22.04)

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

Error:
ValueError: Model 'meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8' is not available and no llama_model was specified in metadata. Please specify a llama_model in metadata or use a supported model identifier

Cause:
providers/inline/inference/meta_reference/inference.py:80
Parameters in wrong order to build_model_alias, sets 3.2-3B-instruct-qlora-int4-eo8 to try to match against hugging face repo for plain 3.2-3B-Instruct. Reversing order of these parameters fixes this issue.

Error logs

Traceback (most recent call last):
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 354, in
main()
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 286, in main
impls = asyncio.run(construct_stack(config))
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/stack.py", line 191, in construct_stack
await register_resources(run_config, impls)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/stack.py", line 96, in register_resources
await method(**obj.model_dump())
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper
result = await method(self, *args, **kwargs)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/routers/routing_tables.py", line 236, in register_model
registered_model = await self.register_object(model)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/routers/routing_tables.py", line 181, in register_object
registered_obj = await register_object_with_provider(obj, p)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/distribution/routers/routing_tables.py", line 37, in register_object_with_provider
return await p.register_model(obj)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper
result = await method(self, *args, **kwargs)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/inference.py", line 115, in register_model
model = await self.model_registry_helper.register_model(model)
File "/home/aidan/miniconda3/envs/llamastack-meta-reference-gpu/lib/python3.10/site-packages/llama_stack/providers/utils/inference/model_registry.py", line 100, in register_model
raise ValueError(
ValueError: Model 'meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8' is not available and no llama_model was specified in metadata. Please specify a llama_model in metadata or use a supported model identifier

Expected behavior

get_hugging_face_repo should receive "Llama3.2-3B-Instruct:int4-qlora-eo8" as model_descriptor, not just "Llama3.2-3B-Instruct" as is passed in current config.

build_model_alias receives the following parameters: provider_model_id and model_descriptor. The call to build_model_alias in inference.py (line 79) should pass model.core_model_id.value first and model.descriptor() second, not the opposite order they are currently.

AidanFRyan mentioned this issue Jan 19, 2025

correct parameter order of build_model_alias call in inference.py #825

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 not found #824

meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 not found #824

AidanFRyan commented Jan 19, 2025 •

edited

Loading

meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 not found #824

meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 not found #824

Comments

AidanFRyan commented Jan 19, 2025 • edited Loading

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

AidanFRyan commented Jan 19, 2025 •

edited

Loading