Apple Vision Pro Language-Learning PoC

What it is

A high-fidelity prototype for spatial language learning: users interact with an AI-driven barista in a virtual café, powered by a custom STT → LLM → TTS pipeline. Features included branching dialogue, subtitles with grammar feedback, and a lifelike avatar (CC4) with lip-sync and scene animations.

My role & ownership

End-to-end solo build. Designed and implemented the dialogue pipeline, integrated lip-sync and avatar animation, converted and optimized an Asset Store café environment for URP, and tuned performance for stable frame times on Vision Pro. LLMs controlled not just responses but the full application flow, guarding state transitions and scene progression.

Highlights

STT → LLM → TTS pipeline, with LLM driving branching flow and error handling
Realistic CC4 avatar with lip-sync and animated sequences
URP conversion and optimization passes for stable performance on Vision Pro