TY-Blog

Lucas-TY

HPC

HPC-Introduction

by Lucas-TY on | HPC

An introduction to High-Performance Computing (HPC). It covers topics such as the Von Neumann computer architecture, Flynn's Classical Taxonomy, parallel computing terminology, Amdahl's Law, Moore's Law, scalability, shared memory, distributed memory, hybrid distributed-shared memory, parallel programming models, synchronous vs asynchronous communications, collective communication, synchronization, and best practices for I/O.

Flash-Attention

by Lucas-TY on | HPC

An optimization technique for online SoftMax in the context of High-Performance Computing (HPC). It introduces the concept of Online SoftMax and Safe-SoftMax, and proposes a three-pass algorithm for Safe SoftMax. The document also presents the idea of combining loops to improve I/O efficiency and introduces the concept of tiling. Additionally, it introduces Flash Attention-2 as an improved version of Flash Attention, addressing the limitations of the original technique.

DL-Intro

by Lucas-TY on | HPC

Provides an introduction to deep learning, including the history of ImageNet and AlexNet, definitions of machine learning and supervised/unsupervised learning, DNN training with backward pass and activation functions, the difference between parameters and hyperparameters, stochastic gradient descent (SGD) and learning rate, batch size and model size, accuracy and throughput, the impact of model size and dataset size, overfitting and underfitting, dealing with overfitting through regularization, dropout, data augmentation, and early stopping, convolution operation, transformer models, and the components of encoder and decoder in transformer models.

DL-Frameworks

by Lucas-TY on | HPC

Discusses distributed deep neural network training using data parallelism. It covers BLAS libraries, compute-intensive layers, DNN libraries, parallelization strategies (data parallelism and hybrid parallelism), and the allreduce collective method. It also explains the steps involved in data parallelism using allreduce for gradient aggregation and parameter updates.