Organizations analyze large amounts of tabular data to uncover insights, improve products and services, and achieve efficiency. For those enterprises that want to thrive in a rapidly changing environment, the ability to process big data quickly can often create the competitive edge needed to succeed. Because speed is of such critical importance, accelerating the data processing pipeline—and doing it in a way that maximizes hardware utility—can profoundly impact the productivity and outcomes of data science efforts.

This Deep Learning Institute (DLI) workshop will share how to create an end-to-end hardware-accelerated machine learning pipeline for large datasets. You’ll utilize NVIDIA RAPIDSTM and Dask to scale your data science workloads. This workshop will illustrate how the same process can be applied to other machine learning use cases. You’ll then learn how to speed up data engineering by avoiding hidden slowdowns and reduce model development time by maximizing hardware utility. Throughout the development process, you’ll use diagnostic tools to identify delays and learn to mitigate common pitfalls.

 

Learning Objectives


By participating in this workshop, you’ll:
  • Develop and deploy an accelerated end-to-end data processing pipeline for large datasets
  • Scale data science workflows using distributed computing
  • Perform DataFrame transformations that take advantage of hardware acceleration and avoid hidden slowdowns
  • Enhance machine learning solutions through feature engineering and rapid experimentation
  • Improve data processing pipeline performance by optimizing memory management and hardware utilization

Download workshop datasheet (PDF 68 KB)

Workshop Outline

Introduction
(15 mins)
  • Meet the instructor.
  • Create an account at courses.nvidia.com/join
Advanced Extract, Transform, and Load (ETL)
(150 mins)

    Learn to process large volumes of data efficiently for downstream analysis:

  • Discuss current challenges of growing data sizes.
  • Perform ETL efficiently on large datasets.
  • Discuss hidden slowdowns and perform DataFrame transformations properly.
  • Discuss diagnostic tools to monitor and optimize hardware utilization.
  • Persist data in a way that’s conducive for downstream analytics.
Break (60 mins)
Training on Multiple GPUs With PyTorch Distributed Data Parallel (DDP)
(120 mins)

    Learn how to improve data analysis on large datasets:

  • Build and compare classification models.
  • Perform feature selection based on predictive power of new and existing features.
  • Perform hyperparameter tuning.
  • Create embeddings using deep learning and clustering on embeddings.
Break (15 mins)
Deployment
(75 mins)

    Learn how to deploy and measure the performance of an accelerated data processing pipeline:

  • Deploy a data processing pipeline with Triton Inference Server.
  • Discuss various tuning parameters to optimize performance.
Assessment and Q&A
(45 mins)
 

Workshop Details

Duration: 8 hours

Price: Contact us for pricing.

Prerequisites: Basic knowledge of a standard data science workflow on tabular data. To gain an adequate understanding, we recommend this article.

Knowledge of distributed computing using Dask. To gain an adequate understanding, we recommend the “Get Started” guide from Dask.

Completion of the DLI’s Fundamentals of Accelerated Data Science course or an ability to manipulate data using cuDF and some experience building machine learning models using cuML.

Tools, libraries and frameworks: Python, cuDF, Dask, Plotly, NVTabular, cuML, Forest Inference Library, PyTorch, and NVIDIA Triton™ Inference Server

Assessment Type: Skills-based coding assessments evaluate learners’ ability to train deep learning models on multiple GPUs.

Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth..

Hardware Requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.

Languages: English

Upcoming Workshops

Upcoming Public Workshops

Europe / Middle East / Africa

Thursday, September 16, 2021

9:00 a.m.–5:00 p.m. CEST

If your organization is interested in boosting and developing key skills in AI, accelerated data science, or accelerated computing, you can request instructor-led training from the NVIDIA DLI.

Continue Your Learning with These DLI Trainings

Getting Started with Image Segmentation

Building Transformer-Based Natural Language Processing Applications

Building Intelligent Recommender Systems

Digital Fingerprinting with Morpheus

Questions?