PLOS_ntds

This dataset is collected from the Journal Archive section of PLOS Neglected Tropical Disease.

According to Creative Commons Attribution 4.0 International License (CC BY 4.0), articles from PLOS are legally available for reuse, without permission or fees. Anyone may copy, distribute, or reuse these articles, as long as the author and original source are properly cited.

See more on PLOS

The PLOS_ntds dataset exclusively comprises 10100 pieces of research articles from PLOS Neglected Tropical Diseases. It captures various components of each article, including the abstract, article content, author summary, acknowledgements, and other features. Below is a sample structure for one article:

- 10.1371
    - 2024
        - January
            - journal.pntd.0011369
            - journal.pntd.0011678
                - src
                - abstract.json
                - acknowledgments.txt
                - article_features.json
                - author_summary.txt
                - content.json
                - figure_table.json

Each article is identified by its DOI number, used as the file name.

The src folder contains the original PDF file. Figures and tables, downloaded as TIFF files, are also stored in this folder. (disabled)

figure_table.json describes each figure and table downloaded into the src folder.

article_features.json provides basic details for each article, including the URL, publication date, subject areas, authors, and their affiliations.

abstract.json contains the abstract, organized with each subsection name as the dictionary key and the corresponding text as the value. If the abstract has no subsections, the key is labeled "Abstract".

content.json includes the main article content, with each section name as the dictionary key and the text as the value.

acknowledgements.txt and author_summary.txt contain the text data for each corresponding section.

crawler.py provides a python script for establishing this dataset. Crawling figures and pdfs is disabled due to the limitation of disk space.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
10.1371		10.1371
README.md		README.md
crawler.py		crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLOS_ntds

This dataset is collected from the Journal Archive section of PLOS Neglected Tropical Disease.

According to Creative Commons Attribution 4.0 International License (CC BY 4.0), articles from PLOS are legally available for reuse, without permission or fees. Anyone may copy, distribute, or reuse these articles, as long as the author and original source are properly cited.

See more on PLOS

About

Releases

Packages

Languages

myweiii/PLOS_ntds-dataset

Folders and files

Latest commit

History

Repository files navigation

PLOS_ntds

This dataset is collected from the Journal Archive section of PLOS Neglected Tropical Disease.

According to Creative Commons Attribution 4.0 International License (CC BY 4.0), articles from PLOS are legally available for reuse, without permission or fees. Anyone may copy, distribute, or reuse these articles, as long as the author and original source are properly cited.

See more on PLOS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages