This app can generate new audio samples using the Stable Diffusion model (LoRA) for training.
Steps:
- Convert existing audio samples to spectrograms.
- Train a small Stable Diffusion model (LoRA) on spectrograms.
- Generate new audio samples with a specified prompt.
git clone --recurse-submodules git@github.com:Danand/audio-sample-generator.git
cd audio-sample-generator
chmod +x run.sh
./run.sh
Simply follow all pages from the sidebar sequentially.
Advanced settings are skipped here for convenience.
- Open audio files.
- Click the Extract button.
- Review the spectrograms extracted from the audio files.
- Proceed to the next page.
- Specify for each spectrogram:
- Subject
- Caption (comma-separated keywords)
- Optional: Weight
- Click the Save button.
- Click the Train button.
- Type in the Prompt.
- Specify the Amount of audio to generate.
- Click Generate.
- Listen and save the generated samples if desired.
That page is convenient for batch converting spectrograms to audio samples. You can experiment with any images of the respective size, not necessarily spectrograms.