Listen, Chat, And Edit on Edge: Text-Guided Soundscape Modification for Real-Time Auditory Experience

What is this project about?

Listen, Chat, and Edit (LCE) is a cutting-edge multimodal sound mixture editor designed to modify each sound source in a mixture based on user-provided text instructions. The system features a user-friendly chat interface and the unique ability to edit multiple sound sources simultaneously within a mixture without the need for separation. Using open-vocabulary text prompts interpreted by a large language model, LCE creates a semantic filter to edit sound mixtures, which are then decomposed, filtered, and reassembled into the desired output.

Project Structure

data/datasets: Contains the scripts used to process dataset and prompts.
demonstration: A demonstration of an input mixure and the edited version.
embeddings: The pkl file recieved from the LLM are stored in this folder.
hparams: Hyperparameters settings for the models.
llm_cloud: Configuration and scripts for cloud-based language model interactions.
modules: Core modules and utilities for the project.
prompts: Handling and processing of text prompts.
pubsub: Setup for publish-subscribe messaging patterns.
utils: Utility scripts for general purposes.
E6692.2022Spring.LCEE.ss6928.pkk2125.presentationFinal.pptx: Final presentation file detailing project overview and results.
profiling.ipynb: Jupyter notebook for profiling the modules in terms of inference speed and gpu memory usage.
run_lce.ipynb: Main executable notebook for the LCE system.
run_prompt_reader.ipynb: Notebook for reading and processing prompts.
run_prompt_reader_profiling.ipynb: Profiling for the prompt reader.
run_sound_editor_nosb.ipynb: Notebook for the sound editor module without SpeechBrain.

Installation

Clone the repository:

git clone https://github.com/SiavashShams/Listen-Chat-Edit-on-Edge.git

Install required dependencies:
```
pip install -r requirements.txt
```

Usage

To run the main LCE application:

run_lce.ipynb

For a demonstration of the system's capabilities, refer to the demonstration folder.

Implementation

Deploy Conv-TasNet on the Jetson Nano.
Deploy LLAMA 2 on a GCP server
Send a prompt to the server. Communication is handled in two methods - one, through SSH and the other, through Pub/Sub service.
LLM computed the embedding and publishes back the embedding, which is input to the Conv-TasNet model.
The resulting audio mixture is ready to be played!

Links

Presentation

Report

References

Thanks to the authors of Listen, Chat, And Edit for their amazing work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Listen, Chat, And Edit on Edge: Text-Guided Soundscape Modification for Real-Time Auditory Experience

What is this project about?

Project Structure

Installation

Usage

Implementation

Links

References

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
data/datasets		data/datasets
demonstration		demonstration
embeddings		embeddings
hparams		hparams
llm_cloud		llm_cloud
modules		modules
prompts		prompts
pubsub		pubsub
utils		utils
.gitignore		.gitignore
E6692.2022Spring.LCEE.ss6928.pkk2125.presentationFinal.pptx		E6692.2022Spring.LCEE.ss6928.pkk2125.presentationFinal.pptx
Listen__Chat__and_Edit_on_Edge.pdf		Listen__Chat__and_Edit_on_Edge.pdf
README.md		README.md
profiling.ipynb		profiling.ipynb
run_lce.ipynb		run_lce.ipynb
run_prompt_reader.ipynb		run_prompt_reader.ipynb
run_prompt_reader_profiling.ipynb		run_prompt_reader_profiling.ipynb
run_sound_editor_nosb.ipynb		run_sound_editor_nosb.ipynb

SiavashShams/Listen-Chat-Edit-on-Edge

Folders and files

Latest commit

History

Repository files navigation

Listen, Chat, And Edit on Edge: Text-Guided Soundscape Modification for Real-Time Auditory Experience

What is this project about?

Project Structure

Installation

Usage

Implementation

Links

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages