Skip to content

Latest commit

 

History

History
162 lines (118 loc) · 3.25 KB

README.md

File metadata and controls

162 lines (118 loc) · 3.25 KB

Docker data stack

Table of Contents

Run

  1. Install Docker Desktop
  2. Create .env file in the repo root by copying .env.template
  3. Fill in the desired POSTGRES_PASSWORD value in the .env file
  4. Build containers:
docker compose up -d --build

Jupyter

Check out the jupyterlab container logs and click on the link that looks like http://127.0.0.1:8089/lab?token=...

Trino

docker exec -it trino trino
SHOW SCHEMAS FROM db;
USE db.public;
SHOW TABLES FROM public;

Spark

docker exec -it spark-master /bin/bash
cd /opt/spark/bin
./spark-submit --master spark://0.0.0.0:7077 \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi  \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.5.1.jar 100

Thrift

docker exec -it spark-master /bin/bash
./bin/beeline
!connect jdbc:hive2://localhost:10000 scott tiger
show databases;
create table hive_example(a string, b int) partitioned by(c int);
alter table hive_example add partition(c=1);
insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3);
select count(distinct a) from hive_example;
select sum(b) from hive_example;

ScyllaDB

Connect to cqlsh

docker exec -it scylla-1 cqlsh

Create keyspace

CREATE KEYSPACE data
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};

Use keyspace and create table

USE data;

CREATE TABLE data.users (
    user_id uuid PRIMARY KEY,
    first_name text,
    last_name text,
    age int
);

Insert data

INSERT INTO data.users (user_id, first_name, last_name, age)
  VALUES (123e4567-e89b-12d3-a456-426655440000, 'Polly', 'Partition', 77);

Kafka

Create topic

docker exec -it kafka kafka-topics.sh --create --topic test --bootstrap-server 127.0.0.1:9092

Kafka producer

See kafka_producer.ipynb

Kafka consumer

kafka_consumer.ipynb

Airflow

Check out the .env.template file. Copy/paste airflow related variables and update their values where necessary.

Slack integration

You need to create a Slack app and setup AIRFLOW_CONN_SLACK_API_DEFAULT env variable with Slack api key. If you don't want to use this integration, remove the AIRFLOW_CONN_SLACK_API_DEFAULT variable from your .env file.

Mongo