← back

Building Multimodal AI Agents From Scratch — Apoorva Joshi, MongoDB

Original: Building Multimodal AI Agents From Scratch — Apoorva Joshi, MongoDB

116.8K views · Jun 27, 2025 · 36:58 min · Watch on YouTube ↗
Takeaway

Multimodal agents are built by combining LLM reasoning with perception, tools, and persistent memory — but use them only when simpler workflows can't handle the task.

Summary

  • Workshop defines AI agents as LLM + perception + planning + tools + memory and contrasts simple prompting, RAG, and agents — only use agents when tasks are complex, non-deterministic, and tolerate latency.
  • Covers ReAct (reason+act) prompting pattern with feedback loops vs chain-of-thought without feedback.
  • Builds a multimodal (text+image) agent in Python using MongoDB as the vector store for memory plus a tool layer.
  • Cautions that agents add cost/latency and recommends starting simple, only escalating to agentic systems when structured workflows fail.
agentsmultimodalmongodb
Original description
In this hands-on workshop, you will build a multimodal AI agent capable of processing mixed-media content—from analyzing charts and diagrams to extracting insights from documents with embedded visuals. Using MongoDB as a vector database and memory store, and Google's Gemini for multimodal reasoning, you will gain hands-on experience with multimodal data processing pipelines and agent orchestration patterns by implementing core components directly, using good ol' Python.

---
In this hands-on workshop, you will build a multimodal AI agent capable of processing mixed-media content—from analyzing charts and diagrams to extracting insights from documents with embedded visuals. Using MongoDB as a vector database and memory store, and Google's Gemini for multimodal reasoning, you will gain hands-on experience with multimodal data processing pipelines and agent orchestration patterns by implementing core components directly, using good ol' Python.

You will be provided with a GitHub repository consisting of learning materials and resources required to successfully execute the hands-on portions of the workshop.

---related links---

https://www.linkedin.com/in/apoorvajoshi95/