im in the process of doing a full rewrite of the app.There are a few reasons for this:
- Svelte 5 which brings a lot of improvements especially for reactivity
- The previous implementation was a bit more hacky than I wanted it to be
- Diarization wasn't great
I had to take some time off this project due to some commitments. I'll be back on working on this project regularly. you can expect weekly updates from here on and ill clean up things for a new release. in the meantime as tou might have noticed the existing docker images aren't valid. this is because currently docker pulls whisper.cpp from their official repo and sets it up. unfortunately this turned out to be a bad move as whisper.cpp changed their build process and hence the current setup no longer works. I have already moved the main branch ahead for the new release. hence if you want to try out the new release please download the repo and run docker build to create your own image. My sincere apologies for the inconvenience and ill fix this up soon.
in the meantime folks who have the time and resources to build and try the new release any feedback would be greatly appreciated. also a warning this release is a breaking change, and you will lose your old data.
The new release brings these changes
- Performance Improvements. rewrite takes advantage of svelte 5 reactivity features
- Changed the transcription engine from whisper.cpp to whisperX
- Significant improvements to the diarization pipeline. diarization will be vastly better.
- Streamlined and simplified setup process. removes the wizard altogether.
- New UI. I Tried playing around with glassmorphism. appreciate feedback on UI. I'm no frontend designer :P
- Support for multilingual transcription. both transcription and diarization now support all languages that whisper model supports
looking forward to any and all feedback. thank you for your patience, support and interest in the project. Folks have submitted some great PRs and im excited to see how the app evolves.
Scriberr is a self-hostable AI audio transcription app. It leverages the open-source Whisper models from OpenAI, utilizing the high-performance WhisperX transcription engine to transcribe audio files locally on your hardware. Scriberr also allows you to summarize transcripts using Ollama or OpenAI's ChatGPT API, with your own custom prompts. From v0.2.0, Scriberr supports offline speaker diarization with significant improvements.
Note: This app is under active development, and this release includes breaking changes. You will lose your old data. Please read the installation instructions carefully.
- Features
- Demo and Screenshots
- Installation
- Contributing
- License
- Acknowledgments
- Fast Local Transcription: Transcribe audio files locally using WhisperX for high performance.
- Hardware Acceleration: Supports both CPU and GPU (NVIDIA) acceleration.
- Customizable Compute Settings: Configure the number of threads, cores, and model size.
- Offline Speaker Diarization: Improved speaker identification without internet dependency.
- Multilingual Support: Supports all languages that the Whisper model supports.
- Customize Summarization: Optionally summarize transcripts with ChatGPT or Ollama using custom prompts.
- API Access: Exposes API endpoints for automation and integration.
- User-Friendly Interface: New UI with glassmorphism design.
- Mobile Ready: Responsive design suitable for mobile devices.
And more to come. Checkout the planned features section.
Note:
Demo was run locally on a MacBook Air M2 using Docker. Performance depends on the size of the model used and the number of cores and threads assigned. The demo was running in development mode, so performance may be slower than production.
CleanShot.2024-10-04.at.14.55.46.mp4
- Docker and Docker Compose installed on your system. Install Docker.
- NVIDIA GPU (optional): If you plan to use GPU acceleration, ensure you have an NVIDIA GPU and the NVIDIA Container Toolkit installed.
git clone https://github.com/rishikanthc/Scriberr.git
cd Scriberr
Copy the example .env
file and adjust the settings as needed:
cp env.example .env
Edit the .env
file to set your desired configuration, including:
ADMIN_USERNAME
andADMIN_PASSWORD
for accessing the web interface.OPENAI_API_KEY
if you plan to use OpenAI's GPT models for summarization.HF_API_KEY
if you plan to use HuggingFace models for diarization.HARDWARE_ACCEL
set togpu
if you have an NVIDIA GPU.- Other configurations as needed.
To run Scriberr without GPU acceleration:
docker-compose up -d
This command uses the docker-compose.yml
file and builds the Docker image using the Dockerfile
.
To run Scriberr with GPU acceleration:
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
This command uses both docker-compose.yml
and docker-compose.gpu.yml
files and builds the Docker image using the Dockerfile-gpu
.
Note: Ensure that you have the NVIDIA Container Toolkit installed and properly configured.
Once the containers are up and running, access the Scriberr web interface at http://localhost:3000
(or the port you specified in the .env
file).
If you wish to build the Docker images yourself, you can use the provided Dockerfile
and Dockerfile-gpu
.
docker build -t scriberr:latest -f Dockerfile .
docker build -t scriberr:latest-gpu -f Dockerfile-gpu .
The application can be customized using the following environment variables in your .env
file.
ADMIN_USERNAME
: Username for the admin user in the web interface.ADMIN_PASSWORD
: Password for the admin user.AI_MODEL
: Default model to use for summarization (e.g.,"gpt-3.5-turbo"
).OLLAMA_BASE_URL
: Base URL of your OpenAI API-compatible server if not using OpenAI (e.g., your Ollama server).OPENAI_API_KEY
: Your OpenAI API key if using OpenAI for summarization (Or Ollama ifOLLAMA_BASE_URL
is set)HF_API_KEY
: Your HuggingFace API key if using HuggingFace models for diarization.DIARIZATION_MODEL
: Default model for speaker diarization (e.g.,"pyannote/speaker-diarization"
).MODELS_DIR
,WORK_DIR
,AUDIO_DIR
: Directories for models, temporary files, and uploads.BODY_SIZE_LIMIT
: Maximum request body size (e.g.,"1G"
).HARDWARE_ACCEL
: Set togpu
for GPU acceleration (NVIDIA GPU required), defaults tocpu
.
If needed, you can modify the docker-compose.yml
or docker-compose.gpu.yml
files to suit your environment.
- Volumes: By default, data is stored in Docker volumes. If you prefer to store data in local directories, uncomment the lines in the
volumes
section and specify your paths.
Important: This release includes breaking changes and is not backward compatible with previous versions. You will lose your existing data. Please back up your data before proceeding.
Changes include:
- Performance Improvements: The rewrite takes advantage of Svelte 5 reactivity features.
- Transcription Engine Change: Switched from Whisper.cpp to WhisperX.
- Improved Diarization: Significant improvements to the diarization pipeline.
- Simplified Setup: Streamlined setup process; the wizard has been removed.
- New UI: Implemented a new UI design with glassmorphism.
- Multilingual Support: Transcription and diarization now support all languages that Whisper models support.
- Database Connection Issues: Ensure that the PostgreSQL container is running and accessible.
- GPU Not Detected: Ensure that the NVIDIA Container Toolkit is installed and that Docker is configured correctly.
- Permission Issues: Running Docker commands may require root permissions or being part of the
docker
group. - Docker Images Not Valid: If you encounter issues with pre-built Docker images, consider building the images locally using the provided Dockerfiles.
Check the logs for more details:
docker-compose logs -f
If you encounter issues or have questions, feel free to open an issue.
Contributions are welcome! Feel free to submit pull requests or open issues.
- Fork the Repository: Create a personal fork of the repository on GitHub.
- Clone Your Fork: Clone your forked repository to your local machine.
- Create a Feature Branch: Make a branch for your feature or fix.
- Commit Changes: Make your changes and commit them.
- Push to Your Fork: Push your changes to your fork on GitHub.
- Submit a Pull Request: Create a pull request to merge your changes into the main repository.
For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License. See the LICENSE file for details.
- OpenAI Whisper
- WhisperX
- HuggingFace
- Ollama
- Community contributors who have submitted great PRs and helped the app evolve.
Thank you for your patience, support, and interest in the project. Looking forward to any and all feedback.