Back to All Courses
Phase 4Phase 4: Build Real Things

Multimodal AI

Text, images, audio, video — use all AI modalities together in real pipelines. Build agents that can see, hear, and understand anything.

8 lessonsadvancedmultimodal
Unlock This Course

Get full access to all 21 modules for $20/month or $60 lifetime.

Course Lessons

What Multimodal AI Is — Beyond Text
Lesson 1 · Written
Vision Models — Reading Images, Docs & Screens
Lesson 2 · Written
Combining Vision + Text in Workflows
Lesson 3 · Written
Audio — Transcription, TTS & Voice Interfaces
Lesson 4 · Written
Video AI — Generation, Analysis & Editing
Lesson 5 · Written
Document Intelligence — PDFs, Forms & Invoices
Lesson 6 · Written
Building a Multimodal Pipeline
Lesson 7 · Written
Multimodal Agents — The Full Picture
Lesson 8 · Written