Skip to content

v-bible/bible-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bible Scraper

Scrape bible from multiple resources

contributors last update forks stars open issues license


📔 Table of Contents

🌟 About the Project

🎯 Features

  • Scrape bible from:
  • Currently supports:
    • Verses (with poetry).
    • Footnotes.
    • Headings.
    • References.
    • Psalm metadata (like author, title, etc.).
  • Progress logging.
  • Save to Postgres & SQLite database.

🔑 Environment Variables

To run this project, you will need to add the following environment variables to your .env file:

  • App configs:

    DB_URL: Postgres database connection URL.

    LOG_LEVEL: Log level.

E.g:

# .env
DB_URL="postgres://postgres:postgres@localhost:65439/bible"
LOG_LEVEL=info

You can also check out the file .env.example to see all required environment variables.

🧰 Getting Started

‼️ Prerequisites

This project uses pnpm as package manager:

npm install --global pnpm

Playwright:

Run the following command to download new browser binaries:

npx playwright install

🏃 Run Locally

Clone the project:

git clone https://github.com/v-bible/bible-scraper.git

Go to the project directory:

cd bible-scraper

Install dependencies:

pnpm install

Setup Postgres database using Docker compose:

docker-compose up -d

Migrate the database:

pnpm prisma:migrate

Generate Prisma client:

pnpm prisma:generate

👀 Usage

Scripts

Note

To prevent the error net::ERR_NETWORK_CHANGED, you can temporarily disable the ipv6 on your network adapter:

sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
npx tsx ./src/biblegateway/main.ts
npx tsx ./src/bibledotcom/main.ts

Note

For the bible.com script, it doesn't use the local version code, which may vary for different languages. For example, in Vietnamese language, version "VCB" has local code is "KTHD".

The Lectionary for Mass - Second USA Edition (Sunday Volume, 1998; Weekday Volumes, 2002)

npx tsx ./src/catholic-resources/main.ts

Note

The script get-ordinary-time.ts will log out mismatch gospel reading for Weekday OT between Year I & II. You can see it in dumps/catholic-resources/note-ot.txt.

npx tsx ./src/ktcgkpv/main.ts
  • Inject FTS content to the SQLite database:
./src/scripts/inject_fts.sh

Note

You can update SOURCE_DB and TARGET_DB in the script to change the source & destination database.

Implemented Features

Comparing the scraped data from different sources:

Features biblegateway.com bible.com ktcgkpv.org
Verse ✔️ ✔️ ✔️
Poetry ✔️ ✔️ ✔️
Footnote ✔️ ✔️ ✔️
Cross Reference ✔️ ✔️ ✔️
Psalm Metadata ✔️ ✔️ ✔️
Words of Jesus (red letter) ✔️ ✔️
Proper Names (name translation) ✔️

👋 Contributing

Contributions are always welcome!

Please read the contribution guidelines.

📜 Code of Conduct

Please read the Code of Conduct.

⚠️ License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.

License: CC BY-NC-SA 4.0.

See the LICENSE.md file for full details.

🤝 Contact

Duong Vinh - @duckymomo20012 - tienvinh.duong4@gmail.com

Project Link: https://github.com/v-bible/bible-scraper.

💎 Acknowledgements

Here are useful resources and libraries that we have used in our projects: