Scrape bible from multiple resources
- Scrape bible from:
- Currently supports:
- Verses (with poetry).
- Footnotes.
- Headings.
- References.
- Psalm metadata (like author, title, etc.).
- Progress logging.
- Save to Postgres & SQLite database.
To run this project, you will need to add the following environment variables to
your .env
file:
-
App configs:
DB_URL
: Postgres database connection URL.LOG_LEVEL
: Log level.
E.g:
# .env
DB_URL="postgres://postgres:postgres@localhost:65439/bible"
LOG_LEVEL=info
You can also check out the file .env.example
to see all required environment
variables.
This project uses pnpm as package manager:
npm install --global pnpm
Playwright:
Run the following command to download new browser binaries:
npx playwright install
Clone the project:
git clone https://github.com/v-bible/bible-scraper.git
Go to the project directory:
cd bible-scraper
Install dependencies:
pnpm install
Setup Postgres database using Docker compose:
docker-compose up -d
Migrate the database:
pnpm prisma:migrate
Generate Prisma client:
pnpm prisma:generate
Note
To prevent the error net::ERR_NETWORK_CHANGED
, you can temporarily disable
the ipv6 on your network adapter:
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
- Scrape bible (from biblegateway.com):
npx tsx ./src/biblegateway/main.ts
- Scrape bible (from bible.com):
npx tsx ./src/bibledotcom/main.ts
Note
For the bible.com
script, it doesn't use the local version code, which
may vary for different languages. For example, in Vietnamese language, version
"VCB"
has local code is "KTHD"
.
- Scrape Liturgical resources for Ordinary Times (Weekdays & Sundays) from catholic-resources.org:
The Lectionary for Mass - Second USA Edition (Sunday Volume, 1998; Weekday Volumes, 2002)
npx tsx ./src/catholic-resources/main.ts
Note
The script get-ordinary-time.ts
will log out mismatch gospel reading for
Weekday OT between Year I & II. You can see it in
dumps/catholic-resources/note-ot.txt
.
- Scrape bible (from ktcgkpv.org):
npx tsx ./src/ktcgkpv/main.ts
- Inject FTS content to the SQLite database:
./src/scripts/inject_fts.sh
Note
You can update SOURCE_DB
and TARGET_DB
in the script to change the source
& destination database.
Comparing the scraped data from different sources:
Features | biblegateway.com | bible.com | ktcgkpv.org |
---|---|---|---|
Verse | ✔️ | ✔️ | ✔️ |
Poetry | ✔️ | ✔️ | ✔️ |
Footnote | ✔️ | ✔️ | ✔️ |
Cross Reference | ✔️ | ✔️ | ✔️ |
Psalm Metadata | ✔️ | ✔️ | ✔️ |
Words of Jesus (red letter) | ✔️ | ✔️ | ❌ |
Proper Names (name translation) | ❌ | ❌ | ✔️ |
Contributions are always welcome!
Please read the contribution guidelines.
Please read the Code of Conduct.
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
See the LICENSE.md file for full details.
Duong Vinh - @duckymomo20012 - tienvinh.duong4@gmail.com
Project Link: https://github.com/v-bible/bible-scraper.
Here are useful resources and libraries that we have used in our projects:
- bible.com: bible.com website.
- biblegateway.com: biblegateway.com website.
- ktcgkpv.org: Nhóm Phiên Dịch Các Giờ Kinh Phụng Vụ website.
- The Lectionary for Mass (1998/2002 USA Edition): compiled by Felix Just, S.J., Ph.D.