Skip to content

Files

Latest commit

 

History

History
16 lines (12 loc) · 509 Bytes

README.md

File metadata and controls

16 lines (12 loc) · 509 Bytes

nepalbhasa-corpus

Nepal bhasa text corpus for NLP.

Text is in jsonlines format, each line is a post.

File key

  • _raw.jsonl = original scraped text in devanagari script
  • _clean.jsonl = cleaned up version in devanagari script
  • _newa.jsonl = cleaned up version converted to Prachalit (Newa) script

Source

Scraped from these nepal bhasa news portals

  • nepalbhasatimes.com
  • nepalmandal.com

Scraped without explicit permission. To be used for betterment of Nepal Bhasa lingustic models and tools