- Create an account here: https://www.ncbi.nlm.nih.gov/myncbi/
- Note the API key specified here: https://account.ncbi.nlm.nih.gov/settings/
- Install Entrez:
pip install entrezpy
- As this method is limited to fetch 9,998 records at a time we can split the fetch start and end date to accumulate them.
- This method of data acquisition is limited to fetch 9,998 records.
- In-order to fasten the fetching process Data_Acquisition_Parallel_Threading utilised Process parallelization method. However, utilizing this method sometime results in blocking the API key by PubMed.
Note: We have opted to gather the data utilizing the API calls method for our project. The decision is based on the time taken to gather the data.
- Installation commands:
sh -c "$(curl -fsSL https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh)"
sh -c "$(wget -q https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh -O -)"
- Exporting the environment variables:
export PATH=${HOME}/edirect:${PATH}
- Setting the API key in bash_profile and .zshrc configuration files:
export NCBI_API_KEY=unique_api_key
- This method of data acquisition has no limit in record fetches.
- This method is, time-consuming when compared to API calls method.