type
status
date
slug
summary
tags
category
icon
password
Data to features
"One-hot encoding"
- avoids ordering unordered data
- linear sized in number of categories
"Bag of words"
- Pros: Easy to index for search
- Cons:
- All words equally different
- Can limit representation size to k most common words
- NO ORDER
- Code for BoW
- for each word in document:
- for each element in observation:
- Example: BoW for classification
- Euclidean(L_2) distance:
- L_1 distance:
- L_p norm:
BoW frequency
- "Norming"
- "Normalization"
- After normalization,