- Run
preprocess_data.py
to generate dictionaries containing n-best asr scores for each utterance. - Run
lllm_scoring.py
to update dictionaries with llm scores for each utterance. (forgpt2
andbert
) - Run
combined_scores.py
with arg--lambda_param
to combine the asr and llm scores. - Run
compute_error_rate.py
to compute the error rate for a given hypothesis dictionary. gridsearch.sh
Tests error rates on a range of lambda values.hyp_comb_10_dict_test_other.json
contains the hypotheses and all the scores for the automasking experimenthyp_comb_masks_10_dict_test_other.json
contains the hypotheses and all the scores for the selective mask-based experiment
forked from justin-dannemiller/ASR_LLM_Rescoring
-
Notifications
You must be signed in to change notification settings - Fork 0
Rescoring Automatic Speech Recognition using Large Language Models
License
saagar-parikh/ASR_LLM_Rescoring
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Rescoring Automatic Speech Recognition using Large Language Models
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Jupyter Notebook 71.3%
- Shell 15.0%
- Python 13.7%