Necesitas las instrucciones en español🇪🇸? Sin problema, haz click aquí
A Node.js ETL with the database of Paraguay's RUC (taxpayers id's)🏢
As Paraguay gov doesn't provide a webservice to get taxpayers data, I decided to create a solution for that. 🤓
The app runs a task on start and at a scheduled time every day. You can schedule it to what fits you best. 🔄
The ETL gets the information from the government website, extracts the data from zip files, parses it, and saves it into a SQLite database. 💾
- Node 🚀 (Tested with node 18)
- Clone the repo
- CD into the repo
- Copy the
.env.example
file to.env
and make the adjustments you need - Run
npm install
- Run
npm run migrate --name init
- Run
npm run build
- Run
npm start
...or, if you want to trigger the ETL proccess immediately, runnpm run start -- startmeup
If you followed the steps correctly, you should see the process starting at your scheduled time (or immediately with the startmeup flag), and the output should look some like this:
Downloading zip, and parsing data for ending digit: 0
173995 contribuyentes found
Storing data...
And, after a little while, you should see:
Done with ending digit: 0
The process will repeat for all the ending digits (0-9).
Once your database is populated, you can use the REST API to get the data.
The app exposes a simple REST API with the following endpoints:
GET /ruc/:ruc
- Returns the taxpayer data for the given RUC
e.g. /ruc/80000001
returns
{
"ruc": "80000001",
"razonSocial": "NAVIERA CONOSUR SOCIEDAD ANONIMA",
"digitoVerificador": "3",
"rucAnterior": "NCOA905190H",
"estado": "BLOQUEADO",
"fechaHoraImportacion": "2023-12-07T11:31:51.496Z"
}
GET /razon-social/:term
- Returns an array of taxpayers data that contains the giver term in their name (Razon Social)
e.g. /razon-social/MARTINETTI LOPEZ
returns
[
{
"ruc": "2509803",
"razonSocial": "MARTINETTI LOPEZ, VICTOR ALEJANDRO",
"digitoVerificador": "9",
"rucAnterior": "MALV813370R",
"estado": "ACTIVO",
"fechaHoraImportacion": "2023-12-07T11:38:31.248Z"
},
{
"ruc": "2509804",
"razonSocial": "MARTINETTI LOPEZ, JUAN RAFAEL",
"digitoVerificador": "7",
"rucAnterior": "CAVI712851Z",
"estado": "ACTIVO",
"fechaHoraImportacion": "2023-12-07T11:42:02.729Z"
},
{
"ruc": "3187875",
"razonSocial": "MARTINETTI LOPEZ, FABIOLA GRACIELA",
"digitoVerificador": "0",
"rucAnterior": "MALF901730L",
"estado": "ACTIVO",
"fechaHoraImportacion": "2023-12-07T11:45:13.706Z"
}
]
The main caveat is that the goverment can change anything at any moment (e.g. the url, the format of the zip files, etc) so the ETL would stop working. Obviously, that will affect me as well, so I'll try to keep this repo updated as much as I can.
TSConfig from Total Typescript
https://www.totaltypescript.com/tsconfig-cheat-sheet
Why use SQLite?
First motivation:
https://kentcdodds.com/blog/i-migrated-from-a-postgres-cluster-to-distributed-sqlite-with-litefs
That lead me to this:
https://fly.io/blog/all-in-on-sqlite-litestream/
- Add a simple api
- Add tests
- Add a logger configured to send email notifications on errors
- Add a crawler to try to detect changes in the government website
MIT