Automate Product Listings with Gemini + Vision Agents

Stefan Blos, Senior Developer Advocate at Stream, walks through what's possible with early access to the Gemini 3.1 Flash Live model: object detection, AI image polish with Nano Banana, web search, and a guided multi-step workflow, all driven by a single voice conversation.

What's covered: Setting up the Vision Agents SDK with the Gemini plugin, defining tools for image generation and product search, building a video processor to analyze live frames, orchestrating multi-step agent workflows with instruction following, and connecting everything to a Next.js frontend via WebSocket events.

Grab your Gemini API key at Google AI Studio and explore the Vision Agents SDK from Stream to get started.

Fornecedor: Google Cloud EMEA Limited   |   Língua: Inglês