Blog | PyTorch

July 22, 2024

Deep Dive on the Hopper TMA Unit for FP8 GEMMs

Abstract

July 11, 2024

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large language models and long-context applications. FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is now used by most libraries to accelerate Transformer training and inference. This has contributed to a massive increase in LLM context length in the last two years, from 2-4K (GPT-3, OPT) to 128K (GPT-4), or even 1M (Llam...

July 10, 2024

Learn how to develop Android applications with ExecuTorch and Llama models

This blog is courtesy of the PyTorch team at Arm. More details can be found here.

July 09, 2024

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Summary

July 03, 2024

Announcing Hacker Cup AI Track at NeurIPS 2024

The PyTorch team in partnership with Meta Hacker Cup, and Microsoft Research, are excited to announce the Hacker Cup AI Track at NeurIPS 2024. This will be the first AI track for the popular Meta Hacker Cup programming competition designed to assess the capabilities of Generative AI in performing autonomous code generation tasks. We aim to test the limits of AI in complex coding challenges and measure the performance gap between AI systems and human programmers. We will provide access to all ...

June 25, 2024

Powering the AI Revolution: The PyTorch Documentary

Now live: The official PyTorch Documentary! This film unveils the authentic narrative of PyTorch’s inception, attributing its existence to a dedicated group of unsung heroes driving technological innovation.

June 23, 2024

Training MoEs at Scale with PyTorch

Over the past year, Mixture of Experts (MoE) models have surged in popularity, fueled by powerful open-source models like DBRX, Mixtral, DeepSeek, and many more. At Databricks, we’ve worked closely with the PyTorch team to scale training of MoE models. In this blog post, we’ll talk about how we scale to over three thousand GPUs using PyTorch Distributed and MegaBlocks, an efficient open-source MoE implementation in PyTorch.

PyTorch 2.4 Release Blog

Deep Dive on the Hopper TMA Unit for FP8 GEMMs

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Learn how to develop Android applications with ExecuTorch and Llama models

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Announcing Hacker Cup AI Track at NeurIPS 2024

Powering the AI Revolution: The PyTorch Documentary

Training MoEs at Scale with PyTorch

Install PyTorch

Quick Start With
Cloud Partners

Docs

Tutorials

Resources

Install PyTorch

Quick Start WithCloud Partners

Docs

Tutorials

Resources

Quick Start With
Cloud Partners