RasaNLUCuCo

This repository contains a collection of helpful custom components that can be used in the Rasa NLU pipeline.

1. Spellchecking

We provide a custom Rasa NLU component that implements spellchecking on each token and automatically replaces misspelled tokens with the corrected version. The spellchecking itself is based on pyspellchecker. The component has several parameters that can be set in the configuration file. For example, it supports multiple language ("en", "es", "de", "fr", "pt"). You can also set a distance value, which is set to 1 by default. This parameter is used in the spellchecking algorithm and for details refer to the corresponding GitHub project pyspellchecker. Since the algorithm uses a word frequency list to detect misspelled words and provide the correct version, you can also provide a custom word frequency file (each row with "word, frequency" format). Note that in order to apply this custom spellchecking component in a pipeline, a tokenizer component has to be applied before in the pipeline.

Below you can find a sample configuration using the spellchecking component and providing a custom word frequency file to it.

pipeline:
- name: "WhitespaceTokenizer"
- name: "spellchecking.SpellCheckerCorrection"
  language: "de"
  distance: 2
  word_freq_file: "resources/de_50k.txt"
- name: "CRFEntityExtractor"
- name: "CountVectorsFeaturizer"
- name: "EmbeddingIntentClassifier"

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
spellchecking.py		spellchecking.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RasaNLUCuCo

1. Spellchecking

About

Releases

Packages

Languages

License

kristiankolthoff/RasaNLUCuCo

Folders and files

Latest commit

History

Repository files navigation

RasaNLUCuCo

1. Spellchecking

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages