DBT Core Databricks Project 🚀

Overview

This project implements data transformations using dbt (data build tool) with Databricks as the underlying data warehouse. The project follows a medallion architecture (Bronze, Silver, Gold layers) for data processing and includes comprehensive testing, documentation, and CI/CD integration.

Project Structure: dbt_core_databricks

dbt_core_databricks/
├── analyses/
│   ├── .gitkeep
│   └── macro_demo.sql
├── macros/
│   ├── .gitkeep
│   ├── current_timestamp.sql
│   ├── generate_schema_name.sql
│   └── multiply_cols.sql
├── models/
│   ├── bronze/
│   │   ├── bronze_orders.sql
│   │   ├── bronze_reviews.sql
│   │   └── bronze_users.sql
│   ├── silver/
│   │   ├── _silver.yml
│   │   ├── silver_orders.sql
│   │   ├── silver_products.sql
│   │   └── silver_users.sql
│   ├── gold/
│   │   ├── gold.yml
│   │   ├── gold_avg_rating__daily.sql
│   │   └── gold_sales__daily.sql
│   └── sources/
│       ├── _sources.md
│       └── landing_sources.yml
├── seeds/
│   └── .gitkeep
├── snapshots/
│   ├── .gitkeep
│   ├── _snapshots.yml
│   └── products_snapshots.sql
├── tests/
│   ├── .gitkeep
│   └── generic/
│       └── assert_non_negative.sql
├── .gitignore
├── .user.yml
├── README.md
├── dbt_project.yml
├── package-lock.yml
└── packages.yml

Dashboard Screenshot

Lineage Graph

SQL Compilation Example

Execution Commands in Databricks

Job Execution in DBT

Additional Lineage Visualization

📊 Data Model Overview

Bronze Layer (Raw Data)

Direct ingestion from landing zone
Tables: orders, products, reviews, users
Minimal transformations
PII data tagged with 'contains_pii'

Silver Layer (Transformed)

Cleaned and standardized data
Business logic applications
Key models:
- silver_orders: Calculated order amounts
- silver_products: Current product information
- silver_users: Anonymized user data

Gold Layer (Business Ready)

Aggregated analytics views
Key models:
- gold_sales__daily: Daily sales analytics
- gold_avg_rating__daily: Product rating analytics

🚀 Getting Started

Prerequisites

dbt Core installed
Databricks account and cluster
Python 3.7+

Setup

Clone the repository bash git clone
Install dependencies bash pip install dbt-databricks
Configure profiles.yml yaml dbt_core_databricks: outputs: dev: type: databricks catalog: your_catalog schema: your_schema host: your-databricks-host http_path: your-http-path token: your-token threads: 1 target: dev

🛠️ Features

Data Testing

Generic tests for data quality
Custom test macro: assert_non_negative
Unit tests for transformations
Source freshness checks

Macros

multiply_columns_and_round: Calculate monetary values
generate_schema_name: Custom schema handling
current_timestamp: Timestamp utilities

Snapshots

Type 2 SCD for products table
Timestamp-based tracking
Configured in snapshots/products_snapshots.sql

📚 Documentation

Comprehensive table and column descriptions
Source documentation in models/sources/_sources.md
Generated documentation available via dbt docs

🔄 Development Workflow

Basic Commands

bash dbt run # Run all models dbt test # Run all tests dbt docs generate # Generate documentation dbt docs serve # Serve documentation locally dbt run --select tag:daily # Run tagged models

Model Tags

contains_pii: Models with sensitive data
daily: Daily refresh models
weekly: Weekly refresh models

🔐 Security

PII data tagged and tracked
Credentials managed via environment variables
No sensitive information in repository

🤝 Contributing

Fork the repository
Create a feature branch
Commit changes
Push to the branch
Create a Pull Request

Built using dbt and Databricks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DBT Core Databricks Project 🚀

Overview

Project Structure: dbt_core_databricks

Dashboard Screenshot

Lineage Graph

SQL Compilation Example

Execution Commands in Databricks

Job Execution in DBT

Additional Lineage Visualization

📊 Data Model Overview

Bronze Layer (Raw Data)

Silver Layer (Transformed)

Gold Layer (Business Ready)

🚀 Getting Started

Prerequisites

Setup

🛠️ Features

Data Testing

Macros

Snapshots

📚 Documentation

🔄 Development Workflow

Basic Commands

Model Tags

🔐 Security

🤝 Contributing

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
analyses		analyses
macros		macros
models		models
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
.user.yml		.user.yml
Dbt-databricks-certificate.pdf		Dbt-databricks-certificate.pdf
README.md		README.md
dbt_project.yml		dbt_project.yml
package-lock.yml		package-lock.yml
packages.yml		packages.yml
requirements.txt		requirements.txt

kiranbele11/dbt_core_databricks

Folders and files

Latest commit

History

Repository files navigation

DBT Core Databricks Project 🚀

Overview

Project Structure: dbt_core_databricks

Dashboard Screenshot

Lineage Graph

SQL Compilation Example

Execution Commands in Databricks

Job Execution in DBT

Additional Lineage Visualization

📊 Data Model Overview

Bronze Layer (Raw Data)

Silver Layer (Transformed)

Gold Layer (Business Ready)

🚀 Getting Started

Prerequisites

Setup

🛠️ Features

Data Testing

Macros

Snapshots

📚 Documentation

🔄 Development Workflow

Basic Commands

Model Tags

🔐 Security

🤝 Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages