Vision

Lucas-TY

AI|Jan 30, 2024|Last edited: Sep 21, 2024|
type
status
date
slug
summary
tags
category
icon
password

Computer Vision

CNN Architectures

  • Dense Layers (used as classifier)
  • Convolution Layer (used as feature ectraction layer)
    • Convolution operation
        • notion image
    • Activation function
    • Pooling
      • used to reduce the size of the input
      • summarizes the information for a particular part of the image
    • Multi-Channel Convolution Operation

AlexNet - Two Tower Design

  • Convolution Neural Network based architecture
  • One of the first architecture to demonstrate the potential of CNNs on image-related tasks like object recognition and detection

VGG

  • Increased the model depth and reduced the total number of weights

Receptive field

  • Two 3*3 convolution operations is same as one 5*5 convolution operation

Advantages of decreasing the receptive field

  • Increased number of non-linear rectification layers (1 vs 3 ReLU layers)
    • Makes decision function more discriminative
  • Reduces number of trainable parameters
    • C: number of channels in input and output
    • 7*7 receptive field:
    • 3*3 receptive fields:

Sparsifying Network

Challenge

  • Modern architectures are very inefficient for sparse data structures
    • Even though number of operations will decrease, overhead of lookups will increase

Solution

  • Translation invariance can be achieved using convolution blocks
  • We need to find the optimal local construction and repeat it spatially

Problem with Naïve Module

  • Problem: Pooling layers do not reduce the channel dimension
  • Solution: Use 1X1 convolutions

ResNet

Residual Learning

  • Let H be the mapping, hidden layer wants to learn
    • 𝑦 = 𝐻 (𝑥)
  • Instead of learning a direct mapping, hidden layer can learn the residual mapping of y with respect to x
    • 𝐹 (𝑥) = 𝐻 (𝑥) − 𝑥
    • Can be reformulated to 𝐹 (𝑥) + 𝑥 = 𝐻 (x)
    • Where 𝐹 (𝑥) is the new mapping hidden layer will learn
  • Above formulation is easier to optimize and in extreme case 𝐹 (𝑥) can be trained to zero to learn identity mapping
    • Very difficult to learn directly