Build vision AI agents powered by Vision-Language Models.
NVIDIA VIA is a collection of workflows to build AI agents capable of processing large amounts of live or archived videos and images with Vision-Language Models (VLM) - whether deployed at the edge or cloud. This new generation of visual AI agents will help nearly every industry summarize, search, and extract actionable insights from video using natural language.
Get rich summaries of naunced activities in natural language - whether from long videos or images.
Build Agents with rich interactivity. Ask detailed questions and even "show me" kinds of requests to find specific clips of certain kinds of activities - such as highlight reels or unique events.