RAG-using-Azure-Databricks-CI-CD

Introduction

The project leverages the Retrieval-Augmented Generation (RAG) framework which we are incorporating within our chatbot on Azure Databricks. This approach ensures our chatbot delivers responses that are relevant and contextually precise, while also enabling continuous integration and deployment for streamlined development and updates. This model, integrated within a serverless architecture and supported by Delta Tables for secure data storage and enhances the chatbot's efficiency and scalability while ensuring stringent data security and compliance. Employing MLFlow for lifecycle management further ensures that each model iteration is meticulously tracked and documented, we have leveraged MLFlow's LLM-as-a-judge for evaluating our RAG chatbot.

Project Architecture

Project Overview

This repository houses the RAG-using-Azure-Databricks-CI-CD project, which demonstrates a comprehensive MLOps pipeline encompassing development, production, and monitoring within an Azure Databricks environment.

Getting Started - Setup Guide

To begin working with the RAG-using-Azure-Databricks-CI-CD project, please follow the initial setup instructions detailed in the guide below:

Project Setup Guide on Azure & Databricks

This guide covers creating an Azure account, setting up resource groups, storage accounts, and Databricks workspaces, as well as configuring GitHub secrets and local development tools like the Databricks CLI.

After completing the initial setup, you can proceed to the detailed aspects of the project using the Table of Contents.

Databricks Folder Structure

The project’s folder structure in Databricks is designed to separate files and artifacts across the test, staging, and prod environments, facilitating organized development and deployment.

Understand our Databricks folder structure

Databricks Workflow

We maintain a detailed workflow for model training, evaluation, and deployment within Databricks, ensuring systematic testing and deployment of our models.

Explore the Databricks workflow

Terraform

Terraform is used for infrastructure provisioning and state management within our Databricks environment.

Review our Terraform practices

CI/CD Workflow

Our project utilizes a CI/CD pipeline that orchestrates the workflow from development to staging and production.

Model Version Rollback

Our process for rolling back to previous model versions in production is documented to ensure reliability and ease of transitions.

Learn about model version rollback

MLFlow

MLFlow is integral to our pipeline, providing tools for model versioning, management, and serving in both test and production environments.

Read about our MLFlow setup

Cost Analysis

We conduct a thorough cost analysis to optimize resource allocation and manage expenses effectively.

Delve into our cost analysis approach

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
README		README
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-using-Azure-Databricks-CI-CD

Introduction

Project Architecture

Project Overview

Getting Started - Setup Guide

Table of Contents

Databricks Folder Structure

Databricks Workflow

Terraform

CI/CD Workflow

Model Version Rollback

MLFlow

Cost Analysis

About

Releases

Packages

Contributors 3

License

Ayush-Patel-10/RAG-using-Azure-Databricks-CI-CD

Folders and files

Latest commit

History

Repository files navigation

RAG-using-Azure-Databricks-CI-CD

Introduction

Project Architecture

Project Overview

Getting Started - Setup Guide

Table of Contents

Databricks Folder Structure

Databricks Workflow

Terraform

CI/CD Workflow

Model Version Rollback

MLFlow

Cost Analysis

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages