Back to All Courses
Phase 4Phase 4: Build Real Things
Multimodal AI
Text, images, audio, video — use all AI modalities together in real pipelines. Build agents that can see, hear, and understand anything.
8 lessonsadvancedmultimodal
Course Lessons
What Multimodal AI Is — Beyond Text
Lesson 1 · Written
Vision Models — Reading Images, Docs & Screens
Lesson 2 · Written
Combining Vision + Text in Workflows
Lesson 3 · Written
Audio — Transcription, TTS & Voice Interfaces
Lesson 4 · Written
Video AI — Generation, Analysis & Editing
Lesson 5 · Written
Document Intelligence — PDFs, Forms & Invoices
Lesson 6 · Written
Building a Multimodal Pipeline
Lesson 7 · Written
Multimodal Agents — The Full Picture
Lesson 8 · Written