Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 589 Bytes

README.md

File metadata and controls

7 lines (4 loc) · 589 Bytes

A Trigram Model in JSON, created from scraping human written articles and blog post, consisting of over 330 million words (over 2 gigabytes in size).

The file itself is 60MB because statistically insignificant terms with frequency < 5 were filtered out.

The JSON file has a object with simple key value pair entries, where the key is the trigram (each word is joined by a single space) and the value is the frequency of its occurrence in the corpus scraped.

A usecase: Detecting Spun Content using n-gram analysis