TY-Blog

Vision

An overview of computer vision and discusses various CNN architectures used in this field, such as Dense Layers and Convolution Layers. It highlights the key features and advantages of architectures like AlexNet, VGG, and ResNet. The concept of receptive field and its impact on the network's performance is explained. The page also mentions the challenges and solutions related to sparsity in network structures. Finally, it introduces the concept of residual learning and its benefits in optimizing the network's performance.

HPC-Introduction

by Lucas-TY on | HPC

An introduction to High-Performance Computing (HPC). It covers topics such as the Von Neumann computer architecture, Flynn's Classical Taxonomy, parallel computing terminology, Amdahl's Law, Moore's Law, scalability, shared memory, distributed memory, hybrid distributed-shared memory, parallel programming models, synchronous vs asynchronous communications, collective communication, synchronization, and best practices for I/O.

Flash-Attention

by Lucas-TY on | HPC

An optimization technique for online SoftMax in the context of High-Performance Computing (HPC). It introduces the concept of Online SoftMax and Safe-SoftMax, and proposes a three-pass algorithm for Safe SoftMax. The document also presents the idea of combining loops to improve I/O efficiency and introduces the concept of tiling. Additionally, it introduces Flash Attention-2 as an improved version of Flash Attention, addressing the limitations of the original technique.

DL-Intro

by Lucas-TY on | HPC

Provides an introduction to deep learning, including the history of ImageNet and AlexNet, definitions of machine learning and supervised/unsupervised learning, DNN training with backward pass and activation functions, the difference between parameters and hyperparameters, stochastic gradient descent (SGD) and learning rate, batch size and model size, accuracy and throughput, the impact of model size and dataset size, overfitting and underfitting, dealing with overfitting through regularization, dropout, data augmentation, and early stopping, convolution operation, transformer models, and the components of encoder and decoder in transformer models.

DL-Frameworks

by Lucas-TY on | HPC

Discusses distributed deep neural network training using data parallelism. It covers BLAS libraries, compute-intensive layers, DNN libraries, parallelization strategies (data parallelism and hybrid parallelism), and the allreduce collective method. It also explains the steps involved in data parallelism using allreduce for gradient aggregation and parameter updates.