You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to run qoq.sh in main branch, I encountered the error below. It seems that something mismatches while the script is reading and loading model weights.
25-02-17 08:30:04 | E |
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.12/site-packages/tiktoken/load.py", line 154, in load_tiktoken_bpe
token, rank = line.split()
^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 1)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1636, in convert_slow_tokenizer
).converted()
^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1533, in converted
tokenizer = self.tokenizer()
^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1526, in tokenizer
vocab_scores, merges = self.extract_vocab_merges_from_model(self.vocab_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1502, in extract_vocab_merges_from_model
bpe_ranks = load_tiktoken_bpe(tiktoken_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/tiktoken/load.py", line 157, in load_tiktoken_bpe
raise ValueError(f"Error parsing line {line!r} in {tiktoken_bpe_file}") from e
ValueError: Error parsing line b'\x0e'in /workspace/deepcompressor/model/Llama-2-13b-hf/tokenizer.model
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspace/deepcompressor/deepcompressor/app/llm/ptq.py", line 395, in<module>
main(config, logging_level=tools.logging.DEBUG)
File "/workspace/deepcompressor/deepcompressor/app/llm/ptq.py", line 342, in main
model, tokenizer = config.model.build()
^^^^^^^^^^^^^^^^^^^^
File "/workspace/deepcompressor/deepcompressor/app/llm/model/config.py", line 122, in build
return self._default_build(self.path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/deepcompressor/deepcompressor/app/llm/model/config.py", line 137, in _default_build
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=kwargs.pop("use_fast", True))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in __init__
super().__init__(
File "/root/miniconda3/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 138, in __init__
fast_tokenizer = convert_slow_tokenizer(self, from_tiktoken=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1638, in convert_slow_tokenizer
raise ValueError(
ValueError: Converting from Tiktoken failed, if a converter for SentencePiece is available, provide a model path with a SentencePiece tokenizer.model file.Currently available slow->fast convertors: ['AlbertTokenizer', 'BartTokenizer', 'BarthezTokenizer', 'BertTokenizer', 'BigBirdTokenizer', 'BlenderbotTokenizer', 'CamembertTokenizer', 'CLIPTokenizer', 'CodeGenTokenizer', 'ConvBertTokenizer', 'DebertaTokenizer', 'DebertaV2Tokenizer', 'DistilBertTokenizer', 'DPRReaderTokenizer', 'DPRQuestionEncoderTokenizer', 'DPRContextEncoderTokenizer', 'ElectraTokenizer', 'FNetTokenizer', 'FunnelTokenizer', 'GPT2Tokenizer', 'HerbertTokenizer', 'LayoutLMTokenizer', 'LayoutLMv2Tokenizer', 'LayoutLMv3Tokenizer', 'LayoutXLMTokenizer', 'LongformerTokenizer', 'LEDTokenizer', 'LxmertTokenizer', 'MarkupLMTokenizer', 'MBartTokenizer', 'MBart50Tokenizer', 'MPNetTokenizer', 'MobileBertTokenizer', 'MvpTokenizer', 'NllbTokenizer', 'OpenAIGPTTokenizer', 'PegasusTokenizer', 'Qwen2Tokenizer', 'RealmTokenizer', 'ReformerTokenizer', 'RemBertTokenizer', 'RetriBertTokenizer', 'RobertaTokenizer', 'RoFormerTokenizer', 'SeamlessM4TTokenizer', 'SqueezeBertTokenizer', 'T5Tokenizer', 'UdopTokenizer', 'WhisperTokenizer', 'XLMRobertaTokenizer', 'XLNetTokenizer', 'SplinterTokenizer', 'XGLMTokenizer', 'LlamaTokenizer', 'CodeLlamaTokenizer', 'GemmaTokenizer', 'Phi3Tokenizer']
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.12/site-packages/tiktoken/load.py", line 154, in load_tiktoken_bpe
token, rank = line.split()
^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 1)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1636, in convert_slow_tokenizer
).converted()
^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1533, in converted
tokenizer = self.tokenizer()
^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1526, in tokenizer
vocab_scores, merges = self.extract_vocab_merges_from_model(self.vocab_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1502, in extract_vocab_merges_from_model
bpe_ranks = load_tiktoken_bpe(tiktoken_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/tiktoken/load.py", line 157, in load_tiktoken_bpe
raise ValueError(f"Error parsing line {line!r} in {tiktoken_bpe_file}") from e
ValueError: Error parsing line b'\x0e'in /workspace/deepcompressor/model/Llama-2-13b-hf/tokenizer.model
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/workspace/deepcompressor/deepcompressor/app/llm/ptq.py", line 403, in<module>
raise e
File "/workspace/deepcompressor/deepcompressor/app/llm/ptq.py", line 395, in<module>
main(config, logging_level=tools.logging.DEBUG)
File "/workspace/deepcompressor/deepcompressor/app/llm/ptq.py", line 342, in main
model, tokenizer = config.model.build()
^^^^^^^^^^^^^^^^^^^^
File "/workspace/deepcompressor/deepcompressor/app/llm/model/config.py", line 122, in build
return self._default_build(self.path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/deepcompressor/deepcompressor/app/llm/model/config.py", line 137, in _default_build
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=kwargs.pop("use_fast", True))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in __init__
super().__init__(
File "/root/miniconda3/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 138, in __init__
fast_tokenizer = convert_slow_tokenizer(self, from_tiktoken=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/transformers/convert_slow_tokenizer.py", line 1638, in convert_slow_tokenizer
raise ValueError(
ValueError: Converting from Tiktoken failed, if a converter for SentencePiece is available, provide a model path with a SentencePiece tokenizer.model file.Currently available slow->fast convertors: ['AlbertTokenizer', 'BartTokenizer', 'BarthezTokenizer', 'BertTokenizer', 'BigBirdTokenizer', 'BlenderbotTokenizer', 'CamembertTokenizer', 'CLIPTokenizer', 'CodeGenTokenizer', 'ConvBertTokenizer', 'DebertaTokenizer', 'DebertaV2Tokenizer', 'DistilBertTokenizer', 'DPRReaderTokenizer', 'DPRQuestionEncoderTokenizer', 'DPRContextEncoderTokenizer', 'ElectraTokenizer', 'FNetTokenizer', 'FunnelTokenizer', 'GPT2Tokenizer', 'HerbertTokenizer', 'LayoutLMTokenizer', 'LayoutLMv2Tokenizer', 'LayoutLMv3Tokenizer', 'LayoutXLMTokenizer', 'LongformerTokenizer', 'LEDTokenizer', 'LxmertTokenizer', 'MarkupLMTokenizer', 'MBartTokenizer', 'MBart50Tokenizer', 'MPNetTokenizer', 'MobileBertTokenizer', 'MvpTokenizer', 'NllbTokenizer', 'OpenAIGPTTokenizer', 'PegasusTokenizer', 'Qwen2Tokenizer', 'RealmTokenizer', 'ReformerTokenizer', 'RemBertTokenizer', 'RetriBertTokenizer', 'RobertaTokenizer', 'RoFormerTokenizer', 'SeamlessM4TTokenizer', 'SqueezeBertTokenizer', 'T5Tokenizer', 'UdopTokenizer', 'WhisperTokenizer', 'XLMRobertaTokenizer', 'XLNetTokenizer', 'SplinterTokenizer', 'XGLMTokenizer', 'LlamaTokenizer', 'CodeLlamaTokenizer', 'GemmaTokenizer', 'Phi3Tokenizer']
The working folder is /deepcompressor/examples/llm and the full command is:
When trying to run
qoq.sh
in main branch, I encountered the error below. It seems that something mismatches while the script is reading and loading model weights.The working folder is
/deepcompressor/examples/llm
and the full command is:I'm wondering whether it's due to my model or something else.
The text was updated successfully, but these errors were encountered: