AI Inference
Jul 15, 2024
Power Your AI Projects with New NVIDIA NIMs for Mistral and Mixtral Models
Large language models (LLMs) are growing in adoption across enterprise organizations, with many building them into their AI applications. Foundation models are...
5 MIN READ
Jul 02, 2024
Achieving High Mixtral 8x7B Performance with NVIDIA H100 Tensor Core GPUs and TensorRT-LLM
As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to...
9 MIN READ
Jun 12, 2024
Demystifying AI Inference Deployments for Trillion Parameter Large Language Models
AI is transforming every industry, addressing grand human scientific challenges such as precision drug discovery and the development of autonomous vehicles, as...
14 MIN READ
Jun 11, 2024
Maximum Performance and Minimum Footprint for AI Apps with NVIDIA TensorRT Weight-Stripped Engines
NVIDIA TensorRT, an established inference library for data centers, has rapidly emerged as a desirable inference backend for NVIDIA GeForce RTX and NVIDIA RTX...
8 MIN READ
May 14, 2024
NVIDIA TensorRT 10.0 Upgrades Usability, Performance, and AI Model Support
NVIDIA today announced the latest release of NVIDIA TensorRT, an ecosystem of APIs for high-performance deep learning inference. TensorRT includes inference...
7 MIN READ
May 08, 2024
Accelerate Generative AI Inference Performance with NVIDIA TensorRT Model Optimizer, Now Publicly Available
In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model...
9 MIN READ
Apr 28, 2024
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server
We're excited to announce support for the Meta Llama 3 family of models in NVIDIA TensorRT-LLM, accelerating and optimizing your LLM inference performance. You...
9 MIN READ
Apr 22, 2024
Mistral Large and Mixtral 8x22B LLMs Now Powered by NVIDIA NIM and NVIDIA API
This week’s model release features two new NVIDIA AI Foundation models, Mistral Large and Mixtral 8x22B, both developed by Mistral AI. These cutting-edge...
4 MIN READ
Apr 03, 2024
New Lab: Generative AI Inference with NVIDIA NIM
Get started with NVIDIA NIM for deploying large language models (LLMs). Request access to a free, hands-on lab today.
1 MIN READ
Apr 02, 2024
Tune and Deploy LoRA LLMs with NVIDIA TensorRT-LLM
Large language models (LLMs) have revolutionized natural language processing (NLP) with their ability to learn from massive amounts of text and generate fluent...
15 MIN READ
Mar 27, 2024
NVIDIA H200 Tensor Core GPUs and NVIDIA TensorRT-LLM Set MLPerf LLM Inference Records
Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI...
11 MIN READ
Mar 18, 2024
NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference
What is the interest in trillion-parameter models? We know many of the use cases today and interest is growing due to the promise of an increased capacity for:...
9 MIN READ
Mar 07, 2024
NVIDIA TensorRT Accelerates Stable Diffusion Nearly 2x Faster with 8-bit Post-Training Quantization
In the dynamic realm of generative AI, diffusion models stand out as the most powerful architecture for generating high-quality images with text prompts. Models...
7 MIN READ
Feb 21, 2024
NVIDIA TensorRT-LLM Revs Up Inference for Google Gemma
NVIDIA is collaborating as a launch partner with Google in delivering Gemma, a newly optimized family of open models built from the same research and technology...
4 MIN READ
Feb 13, 2024
Top Inference for Large Language Models Sessions at NVIDIA GTC 2024
Learn how inference for LLMs is driving breakthrough performance for AI-enabled applications and services.
1 MIN READ
Feb 01, 2024
Deploy an AI Coding Assistant with NVIDIA TensorRT-LLM and NVIDIA Triton
Large language models (LLMs) have revolutionized the field of AI, creating entirely new ways of interacting with the digital world. While they provide a good...
12 MIN READ