Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper Model Support – Create a new React view to connect with the deployed Whisper model. #214

Open
anirudTT opened this issue Feb 27, 2025 · 0 comments
Assignees

Comments

@anirudTT
Copy link
Contributor

anirudTT commented Feb 27, 2025

Description

Add a new view for speech-to-text transcription using the Whisper model, following the existing application design patterns from ChatUI. This feature will allow users to transcribe speech from both uploaded files and microphone recordings.

Technical Requirements

1. Route Addition

Add new route in frontend/src/routes/index.tsx:

<Route path="/speech-to-text" element={<SpeechToText />} />

2. Component Structure

frontend/src/components/speech-to-text/
├── SpeechToText.tsx         # Main page component
├── AudioInput.tsx           # Handles both file and mic input
└── TranscriptionView.tsx    # Displays results

3. Features

  • Input Methods Panel

  • File upload button with drag-and-drop support( leverage existing drag and drop components )

  • Microphone recording button (leverage existing VoiceInput.tsx functionality)

  • Progress indicators for both methods

  • Transcription Panel

  • Real-time transcription display

  • Copy to clipboard functionality

  • Export options (if needed)

4. Integration Points

  • Refactor frontend/src/components/chatui/VoiceInput.tsx to share common audio handling logic
  • Integrate with existing cloud Whisper model endpoint currently used in old ai playground.

UI Requirements

  • Match existing application theme and styling

  • Responsive layout similar to ChatUI view

  • Clear visual feedback for:

    • Recording state
    • File upload progress
    • Transcription processing
    • Error states

Acceptance Criteria

  • New route /speech-to-text is accessible
  • Users can upload audio files (.mp3, .wav, etc.)
  • Users can record audio directly
  • Transcription results display in real-time when possible
  • UI matches existing application style
  • Error handling for invalid files/failed recordings
  • Loading states are properly indicated

Dependencies

  • Existing VoiceInput component: frontend/src/components/chatui/VoiceInput.tsx
  • Routes configuration: frontend/src/routes/index.tsx
  • Backend Whisper model API endpoint

Notes

  • Consider reusing audio processing logic from VoiceInput.tsx
  • Follow existing error handling patterns
  • Maintain consistency with other views' styling
  • Ensure accessibility standards are met

Related Components

  • ChatUI view (for styling reference)
  • VoiceInput component (for audio handling)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants