Lucas-TY
An overview of computer vision and discusses various CNN architectures used in this field, such as Dense Layers and Convolution Layers. It highlights the key features and advantages of architectures like AlexNet, VGG, and ResNet. The concept of receptive field and its impact on the network's performance is explained. The page also mentions the challenges and solutions related to sparsity in network structures. Finally, it introduces the concept of residual learning and its benefits in optimizing the network's performance.
An overview of the Transformer model, a powerful architecture in the field of AI. It explains the concept of self-attention, which allows the model to focus on relevant parts of a sequence. The page covers topics such as positional encoding, applications of self-attention in speech, image, and graph processing, as well as the use of Transformers in sequence-to-sequence tasks like chatbots and question answering. The page also mentions the encoder component of the Transformer model and discusses fully connected networks, layer normalization, and the differences between layer normalization and batch normalization.
Discusses similarity search techniques, including TF-IDF and BM25 for sparse search, and Sentence-Bert for dense search. TF-IDF calculates the product of term frequency and inverse document frequency, while BM25 considers term frequency and document length. Sentence-Bert uses cosine similarity and the Hugging Face transformer to calculate similarities between sentence embeddings.
An introduction to High-Performance Computing (HPC). It covers topics such as the Von Neumann computer architecture, Flynn's Classical Taxonomy, parallel computing terminology, Amdahl's Law, Moore's Law, scalability, shared memory, distributed memory, hybrid distributed-shared memory, parallel programming models, synchronous vs asynchronous communications, collective communication, synchronization, and best practices for I/O.
An optimization technique for online SoftMax in the context of High-Performance Computing (HPC). It introduces the concept of Online SoftMax and Safe-SoftMax, and proposes a three-pass algorithm for Safe SoftMax. The document also presents the idea of combining loops to improve I/O efficiency and introduces the concept of tiling. Additionally, it introduces Flash Attention-2 as an improved version of Flash Attention, addressing the limitations of the original technique.
Provides an introduction to deep learning, including the history of ImageNet and AlexNet, definitions of machine learning and supervised/unsupervised learning, DNN training with backward pass and activation functions, the difference between parameters and hyperparameters, stochastic gradient descent (SGD) and learning rate, batch size and model size, accuracy and throughput, the impact of model size and dataset size, overfitting and underfitting, dealing with overfitting through regularization, dropout, data augmentation, and early stopping, convolution operation, transformer models, and the components of encoder and decoder in transformer models.
Discusses distributed deep neural network training using data parallelism. It covers BLAS libraries, compute-intensive layers, DNN libraries, parallelization strategies (data parallelism and hybrid parallelism), and the allreduce collective method. It also explains the steps involved in data parallelism using allreduce for gradient aggregation and parameter updates.
This document provides an introduction to random variables (RVs) and their distributions, including continuous and discrete RVs. It also explains Naive Bayes, a classification algorithm, and discusses the concepts of independence, the chain rule of probability, Bayes net inference, and Bayes net learning. Additionally, it covers unsupervised learning, specifically clustering using K-means, and supervised learning, including linear regression and Naive Bayes for classification.
Introduces linear functions and linear regression in the context of AI. It discusses the concept of data correlation, affine functions, feature transform, and linear regression. It also explains the definition and purpose of loss functions in evaluating model fit.
Provides an introduction to different data encoding techniques in AI, including "one-hot encoding" and "bag of words" (BoW). It explains the advantages and limitations of BoW and provides example code for implementing it. The document also discusses BoW frequency and normalization techniques.
Focusing on learning agents and data. It discusses the reasons for using machine learning, the representation of data, different models for supervised learning, unsupervised learning, and the challenges associated with different types of data.