DL-Frameworks

Lucas-TY

HPC|Jan 30, 2024|Last edited: Feb 1, 2024|
type
status
date
slug
summary
tags
category
icon
password

Introduction to Distributed Deep Neural Network Training – Data Parallelism

notion image
  • BLAS Libraries - the heart of math operations
    • Atlas/OpenBLAS
    • NVIDIA cuBlas
    • Intel Math Kernel Library (MKL)
  • Most compute intensive layers are generally optimized for a specific hardware
    • E.g. Convolution Layer, Pooling Layer, etc.
  • DNN Libraries - the heart of Convolutions!
    • NVIDIA cuDNN (already reached its 8th iteration – cudnn-v8)
    • Intel MKL-DNN – a promising development for CPU-based ML/DL training

Parallelization Strategies

  • Data Parallelism or Model Parallelism
  • Hybrid Parallelism

Allreduce Collective

notion image
  • Element-wise Sum data from all processes and sends to all processes

Data Parallelism - AllReduce

  1. Gradient Aggregation
      • Call MPI_Allreduce to reduce the local gradients
      • Update parameters locally using global gradients