← back
Building Multimodal AI Agents From Scratch — Apoorva Joshi, MongoDB
Original: Building Multimodal AI Agents From Scratch — Apoorva Joshi, MongoDB
Takeaway
Multimodal agents are built by combining LLM reasoning with perception, tools, and persistent memory — but use them only when simpler workflows can't handle the task.
Summary
- Workshop defines AI agents as LLM + perception + planning + tools + memory and contrasts simple prompting, RAG, and agents — only use agents when tasks are complex, non-deterministic, and tolerate latency.
- Covers ReAct (reason+act) prompting pattern with feedback loops vs chain-of-thought without feedback.
- Builds a multimodal (text+image) agent in Python using MongoDB as the vector store for memory plus a tool layer.
- Cautions that agents add cost/latency and recommends starting simple, only escalating to agentic systems when structured workflows fail.
agentsmultimodalmongodb
Original description
In this hands-on workshop, you will build a multimodal AI agent capable of processing mixed-media content—from analyzing charts and diagrams to extracting insights from documents with embedded visuals. Using MongoDB as a vector database and memory store, and Google's Gemini for multimodal reasoning, you will gain hands-on experience with multimodal data processing pipelines and agent orchestration patterns by implementing core components directly, using good ol' Python. --- In this hands-on workshop, you will build a multimodal AI agent capable of processing mixed-media content—from analyzing charts and diagrams to extracting insights from documents with embedded visuals. Using MongoDB as a vector database and memory store, and Google's Gemini for multimodal reasoning, you will gain hands-on experience with multimodal data processing pipelines and agent orchestration patterns by implementing core components directly, using good ol' Python. You will be provided with a GitHub repository consisting of learning materials and resources required to successfully execute the hands-on portions of the workshop. ---related links--- https://www.linkedin.com/in/apoorvajoshi95/