The project leverages the Retrieval-Augmented Generation (RAG) framework which we are incorporating within our chatbot on Azure Databricks. This approach ensures our chatbot delivers responses that are relevant and contextually precise, while also enabling continuous integration and deployment for streamlined development and updates. This model, integrated within a serverless architecture and supported by Delta Tables for secure data storage and enhances the chatbot's efficiency and scalability while ensuring stringent data security and compliance. Employing MLFlow for lifecycle management further ensures that each model iteration is meticulously tracked and documented, we have leveraged MLFlow's LLM-as-a-judge for evaluating our RAG chatbot.
This repository houses the RAG-using-Azure-Databricks-CI-CD
project, which demonstrates a comprehensive MLOps pipeline encompassing development, production, and monitoring within an Azure Databricks environment.
To begin working with the RAG-using-Azure-Databricks-CI-CD
project, please follow the initial setup instructions detailed in the guide below:
This guide covers creating an Azure account, setting up resource groups, storage accounts, and Databricks workspaces, as well as configuring GitHub secrets and local development tools like the Databricks CLI.
After completing the initial setup, you can proceed to the detailed aspects of the project using the Table of Contents.
- Azure & Databricks Setup Guide
- Databricks Folder Structure
- Databricks Workflow
- Terraform
- CI/CD Workflow
- Model Version Rollback
- MLFlow
- Cost Analysis
The project’s folder structure in Databricks is designed to separate files and artifacts across the test, staging, and prod environments, facilitating organized development and deployment.
Understand our Databricks folder structure
We maintain a detailed workflow for model training, evaluation, and deployment within Databricks, ensuring systematic testing and deployment of our models.
Explore the Databricks workflow
Terraform is used for infrastructure provisioning and state management within our Databricks environment.
Review our Terraform practices
Our project utilizes a CI/CD pipeline that orchestrates the workflow from development to staging and production.
Read more about the CI/CD workflow
Our process for rolling back to previous model versions in production is documented to ensure reliability and ease of transitions.
Learn about model version rollback
MLFlow is integral to our pipeline, providing tools for model versioning, management, and serving in both test and production environments.
We conduct a thorough cost analysis to optimize resource allocation and manage expenses effectively.