AI-Introduction-3

Lucas-TY

AI|Jan 30, 2024|Last edited: Feb 1, 2024|
type
status
date
slug
summary
tags
category
icon
password

Data to features

"One-hot encoding"

  • avoids ordering unordered data
  • linear sized in number of categories

"Bag of words"

  • Pros: Easy to index for search
  • Cons:
    • All words equally different
    • Can limit representation size to k most common words
    • NO ORDER
  • Code for BoW
    • for each word in document:
    • for each element in observation:
  • Example: BoW for classification
    • Euclidean(L_2) distance:
    • L_1 distance:
    • L_p norm:
      • notion image

BoW frequency

  • "Norming"
  • "Normalization"
    • After normalization,

Direct insertion for numerical