This repo is to parse and create training data for nyt dataset. Use unzip.py to extract all files in nyt corpus. Use XMLparser.py to parse and extract abstract and full text pairs. Use make_datafiles.py to tokenize and split data into train(90%), val(5%), and test(5%). (Credit for https://github.com/abisee/pointer-generator)
-
Notifications
You must be signed in to change notification settings - Fork 0
boya-song/nyt_extract
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This is to create the NYT dataset for summarization.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published