Instructor-Led Workshop
Model Parallelism: Building and Deploying Large Neural Networks

Request a Workshop for Your Organization

Large language models (LLMs) and deep neural networks (DNNs), whether applied to natural language processing (e.g., GPT-3), computer vision (e.g., huge Vision Transformers), or speech AI (e.g., Wave2Vec 2), have certain properties that set them apart from their smaller counterparts. As LLMs and DNNs become larger and are trained on progressively larger datasets, they can adapt to new tasks with just a handful of training examples, accelerating the route toward general artificial intelligence. Training models that contain tens to hundreds of billions of parameters on vast datasets isn’t trivial and requires a unique combination of AI, high-performance computing (HPC), and systems knowledge. The goal of this course is to demonstrate how to train the largest of neural networks and deploy them to production.

Learning Objectives

By participating in this workshop, you’ll learn how to::

Train neural networks across multiple servers
Use techniques such as activation checkpointing, gradient accumulation, and various forms of model parallelism to overcome the challenges associated with large-model memory footprint.
Capture and understand training performance characteristics to optimize model architecture.
Deploy very large multi-GPU models to production using NVIDIA® TensorRT™-LLM

Download workshop datasheet (PDF 47 KB)

Workshop Outline

Introduction (15 mins)	Meet the instructor. Create an account at courses.nvidia.com/join
Introduction to Training of Large Models (120 mins)	Learn about the motivation behind and key challenges of training large models. Get an overview of the basic techniques and tools needed for large-scale training. Get an introduction to distributed training and the Slurm job scheduler. Train a GPT model using data parallelism. Profile the training process and understand execution performance.
Break (15 mins)
Model Parallelism: Advanced Topics (120 mins)	Increase the model size using a range of memory-saving techniques. Get an introduction to tensor and pipeline parallelism. Go beyond natural language processing and get an introduction to DeepSpeed. Auto-tune model performance. Learn about mixture-of-experts models.
Break (15 mins)
Inference of Large Models (120 mins)	Understand the challenges of deployment associated with large models. Explore techniques for model reduction. Learn how to use TensorRT-LLM. Learn how to use Triton Inference Server. Understand the process of deploying GPT checkpoint to production. See an example of prompt engineering.
Final Review (15 mins)	Review key learnings and answer questions. Complete the assessment and earn a certificate. Complete the workshop survey. Learn how to set up your own AI application development environment.
Next Steps	Continue learning with these DLI trainings. Building Conversational AI Applications. Data Parallelism: How to Train Deep Learning Models on Multiple GPUs. Building Transformer-Based Natural Language Processing Applications. Building RAG Agents for LLM

Workshop Details

Duration: 8 hours

Price: Contact us for pricing.

Prerequisites:

Good understanding of PyTorch
Good understanding of deep learning and data parallel training concepts
Practice with multi-GPU training and natural language processing are useful, but optional

Technologies: PyTorch, NVIDIA NeMo™ Framework, DeepSpeed, Slurm, TensorRT-LLM

Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.

Hardware Requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.

Languages: English

Upcoming Public Workshops

Europe / Middle East / Africa

Thursday, August 26, 2021
9:00 a.m.–5:00 p.m. CEST

If your organization is interested in boosting and developing key skills in AI, accelerated data science, or accelerated computing, you can request instructor-led training from the NVIDIA DLI.

Request a Workshop