NVIDIA Visual Insight Agent (VIA) Workflows

Build vision AI agents powered by Vision-Language Models.

What is VIA?

NVIDIA VIA is a collection of workflows to build AI agents capable of processing large amounts of live or archived videos and images with Vision-Language Models (VLM) - whether deployed at the edge or cloud. This new generation of visual AI agents will help nearly every industry summarize, search, and extract actionable insights from video using natural language.

Transform Your Vision Applications With Generative AI

Leverage the Latest Models

Improve model accuracy by domain adaptation using NVIDIA NeMo and NVIDIA TAO or seamlessly update to newest state-of-the-art models with NVIDIA NIMs.

Build AI Agents to Summarize Video and find Highlights

Processes over 100x faster than the input video time for rich video summaries in natural language.

Multi-modal Interactions

Experience multi-modal interactions powered by generative AI and easily integrate with business systems using standard APIs.

Watch VIA in Action

Example: Warehouse Management

Get rich summaries of naunced activities in natural language - whether from long videos or images.

Example: Sport Analytics

Build Agents with rich interactivity. Ask detailed questions and even "show me" kinds of requests to find specific clips of certain kinds of activities - such as highlight reels or unique events.

Get Started Resources

Apply for Early Access

Discover the power of an AI agent for video summarization and search.

Apply for Early Access

Watch the NVIDIA GTC Talk on Vision AI Agents

Learn how to harness generative AI and large language models with vision AI agents.

Watch the GTC Talk