运行召回任务tokenizer_config.json出现报错

我是把hf上的https://huggingface.co/linear-moe-hub/MoM-Gated-Deltanet-340M/tree/main
手动下载上传，然后运行单个任务进行测试，
modelpath下面的模型权重文件
ls -la
total 1068206
drwxrwxr-x  4 ranwh27 ranwh27         10 Dec 28 12:45 .
drwxrwxr-x 18 ranwh27 ranwh27         24 Dec 28 12:08 ..
drwxrwxr-x  2 ranwh27 ranwh27          1 Dec 28 13:40 .ipynb_checkpoints
-rw-rw-rw-  1 ranwh27 ranwh27        310 Dec 27 13:09 README.md
drwxrwxr-x  2 ranwh27 ranwh27          0 Dec 28 12:45 based_squad
-rw-rw-rw-  1 ranwh27 ranwh27        927 Dec 27 13:08 config.json
-rw-rw-r--  1 ranwh27 ranwh27        111 Dec 27 13:08 generation_config.json
-rw-rw-rw-  1 ranwh27 ranwh27       1519 Dec 27 13:09 gitattributes
-rw-rw-r--  1 ranwh27 ranwh27 1090330976 Dec 27 13:28 model.safetensors
-rw-rw-r--  1 ranwh27 ranwh27        437 Dec 27 13:08 special_tokens_map.json
-rw-rw-r--  1 ranwh27 ranwh27    3505751 Dec 27 13:08 tokenizer.json
-rw-rw-r--  1 ranwh27 ranwh27       1027 Dec 27 13:08 tokenizer_config.json
指令和报错如下
CUDA_VISIBLE_DEVICES=0 python launch_local.py \
    --batch-size 64 \
    -t based_squad \
    -m /modepath \
    --context_length 512 \
    --answer_length 48 \
    --cutting_context \
    --limit 64
Running sweep with 1 configs
2025-12-28:13:50:14,312 INFO     [__main__.py:241] Verbosity set to INFO
2025-12-28:13:50:14,312 INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2025-12-28:13:50:14,318 WARNING  [__main__.py:253]  --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.
2025-12-28:13:50:14,318 INFO     [__main__.py:333] Selected Tasks: ['based_squad']
2025-12-28:13:50:14,318 INFO     [__main__.py:334] Loading selected tasks...
2025-12-28:13:50:14,318 INFO     [evaluator.py:105] Setting random seed to 0
2025-12-28:13:50:14,318 INFO     [evaluator.py:109] Setting numpy seed to 1234
2025-12-28:13:50:14,319 INFO     [evaluator.py:113] Setting torch manual seed to 1234
Traceback (most recent call last):
  File "/home/ranwh27/.local/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
  File "/home/ranwh27/MoM/lm-eval-harness/lm_eval/__main__.py", line 336, in cli_evaluate
    results = evaluator.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/MoM/lm-eval-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/MoM/lm-eval-harness/lm_eval/evaluator.py", line 133, in simple_evaluate
    lm = lm_eval.api.registry.get_model(model).create_from_arg_string(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/MoM/lm-eval-harness/lm_eval/api/model.py", line 134, in create_from_arg_string
    return cls(**args, **args2)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/MoM/lm-eval-harness/lm_eval/models/local_lm.py", line 17, in __init__
    tokenizer = load_tokenizer(checkpoint_name, is_hf=is_hf)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/MoM/lm-eval-harness/lm_eval/models/local_utils/loading.py", line 105, in load_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/.local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 1156, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/.local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2113, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/.local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2359, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/.local/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 154, in __init__
    super().__init__(
  File "/home/ranwh27/.local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 133, in __init__
    slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/.local/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 171, in __init__
    self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/.local/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 198, in get_spm_processor
    tokenizer.Load(self.vocab_file)
  File "/home/ranwh27/.local/lib/python3.11/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ranwh27/.local/lib/python3.11/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: not a string
Decoded with mode: default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

运行召回任务tokenizer_config.json出现报错 #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

运行召回任务tokenizer_config.json出现报错 #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions