You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
labels = []
for i, label in enumerate(examples[f"ner_tags"]):
word_ids = tokenized_inputs.word_ids(batch_index=i) # Map tokens to their respective word.
previous_word_idx = None
label_ids = []
for word_idx in word_ids: # Set the special tokens to -100.
if word_idx is None:
label_ids.append(-100)
elif word_idx != previous_word_idx: # Only label the first token of a given word.
label_ids.append(label[word_idx])
else:
label_ids.append(-100)
previous_word_idx = word_idx
labels.append(label_ids)
tokenized_inputs["labels"] = labels
return tokenized_inputs`
Using the argument batched=True is not just a speed optimization as suggested here:
When batched=False, this script will outright crash on label_ids.append(label[word_idx]) as the label would be an integer, not an array.
The text was updated successfully, but these errors were encountered:
Doc request
When calling the tokenization script listed under [preprocessing]https://huggingface.co/docs/transformers/main/en/tasks/token_classification#preprocess):
`def tokenize_and_align_labels(examples):
tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)
Using the argument
batched=True
is not just a speed optimization as suggested here:When
batched=False
, this script will outright crash onlabel_ids.append(label[word_idx])
as the label would be an integer, not an array.The text was updated successfully, but these errors were encountered: