An **embedding** is a relatively low-dimensional space into which you can
translate high-dimensional vectors. Embeddings make it easier to do machine
learning on large inputs like sparse vectors representing words. Ideally, an
embedding captures some of the semantics of the input by placing semantically
similar inputs close together in the embedding space. An embedding can be
learned and reused across models.

# Embeddings

## Motivation From Collaborative Filtering

**Input:**1,000,000 movies that 500,000 users have chosen to watch**Task:**Recommend movies to users

To solve this problem some method is needed to determine which movies are similar to each other.

## Organizing Movies by Similarity (1d)

## Organizing Movies by Similarity (2d)

## Two-Dimensional Embedding

## Two-Dimensional Embedding

## d-Dimensional Embeddings

- Assumes user interest in movies can be roughly explained by d aspects
- Each movie becomes a d-dimensional point where the value in dimension d represents how much the movie fits that aspect
- Embeddings can be learned from data

## Learning Embeddings in a Deep Network

- No separate training process needed -- the embedding layer is just a hidden layer with one unit per dimension
- Supervised information (e.g. users watched the same two movies) tailors the learned embeddings for the desired task
- Intuitively the hidden units discover how to organize the items in the d-dimensional space in a way to best optimize the final objective

## Input Representation

- Each example (a row in this matrix) is a sparse vector of features (movies) that have been watched by the user
- Dense representation of this example as: (0, 1, 0, 1, 0, 0, 0, 1)

Is not efficient in terms of space and time.

## Input Representation

- Build a dictionary mapping each feature to an integer from 0, ..., # movies - 1
- Efficiently represent the sparse vector as just the movies the user watched. This might be represented as:

## An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

## An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

## An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

## An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

## An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

## An Embedding Layer in a Deep Network

Regression problem to predict home sales prices:

## An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

## An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

## An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

## An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

## An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

## An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

## An Embedding Layer in a Deep Network

Multiclass Classification to predict a handwritten digit:

## An Embedding Layer in a Deep Network

Collaborative Filtering to predict movies to recommend:

## An Embedding Layer in a Deep Network

Collaborative Filtering to predict movies to recommend:

## An Embedding Layer in a Deep Network

Collaborative Filtering to predict movies to recommend:

## An Embedding Layer in a Deep Network

Collaborative Filtering to predict movies to recommend:

## An Embedding Layer in a Deep Network

Collaborative Filtering to predict movies to recommend:

## An Embedding Layer in a Deep Network

Collaborative Filtering to predict movies to recommend:

## An Embedding Layer in a Deep Network

Collaborative Filtering to predict movies to recommend:

## Correspondence to Geometric View

**Deep Network**

- Each of hidden units corresponds to a dimension (latent feature)
- Edge weights between a movie and hidden layer are coordinate values

**Geometric view of a single movie embedding**

## Selecting How Many Embeddings Dims

- Higher-dimensional embeddings can more accurately represent the relationships between input values
- But more dimensions increases the chance of overfitting and leads to slower training
- Empirical rule-of-thumb (a good starting point but should be tuned using the validation data): $$ dimensions \approx \sqrt[4]{possible\;values} $$

## Embeddings as a Tool

- Embeddings map items (e.g. movies, text,...) to low-dimensional real vectors in a way that similar items are close to each other
- Embeddings can also be applied to dense data (e.g. audio) to create a meaningful similarity metric
- Jointly embedding diverse data types (e.g. text, images, audio, ...) define a similarity between them