The Python script provided in this repository is a text analyzer or a Python filter. It gives you some basic stats and an advanced measure called Type-Token Ratio (TTR).
TTR is a measure used to assess the diversity or richness of vocabulary in a text. TTR = (Number of unique word types) / (Total number of words).
In order to use the script, find your text file and in a terminal type the following:
python word_stats.py my_file.txt
Here is a sample output from the script:
The text has 7077 characters, 1183 words, and 52 sentences. The average lenght of a sentence is 22 word. Standrd deviation of the length of sentence is 11.5 word. Type-Token Ratio (TTR): 0.49