- First Place - UofTHacks 2025
Persona transforms language learning through an immersive AI tutoring experience that adapts to you in real-time. By combining computer vision, neural networks, and 3D animation, Persona creates a natural learning environment that understands and responds to your facial expressions, pronunciation, and learning style.
- Real-time Emotional Understanding: Analyzes facial expressions to gauge engagement and understanding
- Precise Pronunciation Feedback: Tracks lip movements for accurate pronunciation guidance
- Fluid 3D Animation: Generates natural, lip-synced character animations that respond to your interactions
- Adaptive Learning: Personalizes conversations and lessons based on your progress and learning style
- Multi-modal Processing: Simultaneously handles video, audio, and text inputs for seamless interaction
- Continuous facial analysis using deep learning models
- Advanced facial landmark detection
- Emotion recognition neural networks
- Multi-threaded feature extraction
- Real-time rigging and animation (Mixamo + Blender)
- Live lip-sync through Rhubarb phoneme detection
- Custom animation blending
- Synchronized facial expression mapping
- WhisperAPI for speech-to-text
- ElevenLabs for dynamic voice generation
- Claude-powered conversation engine
- Parallel AI model processing
- CPU: 4+ cores recommended for parallel processing
- GPU: NVIDIA GPU with CUDA support (8GB+ VRAM recommended)
- RAM: 16GB minimum
- Storage: 5GB for models and basic assets
- Webcam: Required for facial analysis
- Microphone: Required for speech input
The system operates through a microservices architecture that coordinates multiple processes:
- Video Processing Service
- Handles real-time facial analysis
- Extracts emotional and pronunciation features
- Animation Service
- Generates fluid 3D character movements
- Synchronizes lip movements with speech
- Conversation Service
- Manages AI dialogue flow
- Processes language learning logic
- Integration Layer
- Orchestrates all services
- Maintains real-time performance