Skip to content

wiki2mongo is Java library for importing articles from wikipedia dump into MongoDB

License

Notifications You must be signed in to change notification settings

crtomirmajer/wiki2mongo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

wiki2mongo

wiki2mongo is small java library for importing Wikipedia articles from xml dump into MongoDB. It also removes wiki markup, infoboxes, html tags, reference tables etc. before storing.

It uses Akka Streams for concurrency.

Performance

wiki2mongo is able to process and store 12 million articles in less than 2 hours on 6 core machine with SSD.

About

wiki2mongo is Java library for importing articles from wikipedia dump into MongoDB

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages