This walkthrough describe how to prepare and share your pipeline that you created with dlt init
. It assumes that your pipeline is already working be it locally or on production. It also assumes that you still use a simple file layout created by dlt init
- Fork the pipelines repository
- Clone the forked repository
git clone git@github.com:rudolfix/pipelines.git
- Make a feature branch in the fork
cd pipelines
git checkout -b rfix/weatherapi
The share your pipeline with the community you need to slightly modify your project. dlt init
may place your pipeline along many others in someone's else project and following changes are required.
We'll use pipeline that you created in our create a pipeline walkthrough
- Choose a name for your pipeline ie. weatherapi
- Create a folder named
weatherapi
- Add file
__init__.py
to that folder and move there your sources and resources. In our case we move theweatherapi_resource
andweatherapi_source
functions to the new file. Remember to importdlt
and other dependencies. - Rename your pipeline script
weatherapi.py
toweatherapi_pipeline.py
. Import your sources and resources from theweatherapi
folder. - Add a module level docstring to your newly created
__init__.py
with the pipeline description. This will be visible to the users when they dodlt init -l
Your __init__.py
file should look more or less like this. Mind the docstring at the top:
"""Loads weather data from weatherapi.com"""
import dlt
from dlt.sources.helpers import requests
@dlt.source
def weatherapi_source(api_secret_key=dlt.secrets.value):
return weatherapi_resource(api_secret_key)
@dlt.resource(write_disposition="append")
def weatherapi_resource(api_secret_key=dlt.secrets.value):
url = "https://api.weatherapi.com/v1/current.json"
params = {
"q": "NYC",
"key": api_secret_key
}
response = requests.get(url, params=params)
response.raise_for_status()
yield response.json()
Your demo script should look more or less like this.
import dlt
from weatherapi import weatherapi_source
def load_new_york_current_weather():
# configure the pipeline with your destination details
pipeline = dlt.pipeline(pipeline_name='weatherapi', destination='duckdb', dataset_name='weatherapi_data')
# run the pipeline with your parameters
load_info = pipeline.run(weatherapi_source())
# pretty print the information on data that was loaded
print(load_info)
if __name__=='__main__':
load_new_york_current_weather()
-
Make sure that your
requirements.txt
contain the dependencies that your pipeline needs. Make sure it contains thedlt
dependency with version on which you test the pipeline (dlt --version
) -
Run your pipeline. It should work exactly as before.
python weatherapi_pipeline.py
💡
weatherapi_pipeline.py
is something that we call demo script. If there's something important/interesting/useful about your pipeline that you want to show to other - add it!
Shared pipelines are added to pipelines
folder of the pipelines
repo.
- Copy the
weatherapi
folder with all files, theweatherapi_pipeline.py
andrequirements.txt
file topipelines
folder
cp -R weatherapi requirements.txt weatherapi_pipeline.py ../../forks/pipelines/pipelines/
(my clone is in forks/pipelines/
)
- Try to verify that your pipeline is correctly added. This is optional. Our CI will also do it but we'll save time if you do.
🥇 You'll need
make
andpoetry
to be installed. Head on to the root folder ofpipelines
repo and
- make install-poetry
- make dev
- make lint-dlt-init
the last makes sure that all pipelines
- Try to run the pipeline in the new location.
- Add all files and commit it
- Push to your fork
- Go back to README.md on info on creating issue with pipeline announcement and opening a PR to our master