Purpose: This Python program serves as a basic speech-to-text AI assistant. It utilizes Google's Web Speech API for speech recognition and the Gemini-1.5-Flash generative AI model for text generation. I also built show my GF I can somewhat code.
Dependencies:
speech_recognition
pyttsx3
time
google.generativeai
os
dotenv
Setup:
- Create a Google Cloud Platform project and enable the Web Speech API and Generative AI API.
- Obtain an API key for your project.
- Create a
.env
file in your project directory and add the API key asGOOGLE_API_KEY
.
Usage:
- Run the Python script.
- Speak into your microphone when prompted.
- The assistant will process your speech and respond using the Gemini-1.5-Flash model.
Key Features:
- Speech recognition: Uses Google's Web Speech API to convert spoken words into text.
- Text-to-speech: Converts text responses into spoken words using
pyttsx3
. - AI-powered responses: Leverages the Gemini-1.5-Flash model to generate contextually relevant responses.
- Conversation history: Maintains a simple conversation history to provide context for responses.
Limitations:
- Speech recognition accuracy: May vary depending on factors like background noise and microphone quality.
- AI model limitations: The Gemini-1.5-Flash model's capabilities are limited and may not always provide perfect responses.
Future enhancements:
- Improved speech recognition: Explore alternative speech recognition engines or techniques.
- Enhanced AI model: Consider using more advanced AI models with greater capabilities.
- Natural language understanding: Implement techniques to better understand the user's intent.
- Integration with other services: Connect the assistant to other services like calendars, email, or smart home devices.