Zum Starten hier klicken

Inhaltsverzeichnis

Seite 1

Today’s Topic: Clustering 1

Restaurant recommendations

Input

Algorithm 0

Another look at the input - a matrix

Now that we have a matrix

Similarity between two people

Algorithm 1.1

Algorithm 1.k

Slightly more sophisticated attempt

How do you cluster?

Why cluster documents?

Improving search recall

Speeding up vector space retrieval

Seite 16

Clustering for UI (1) Corpus analysis/navigation

Clustering for UI (2) Navigating search results

Results list clustering example

Search Engine Example: Vivisimo

Representation for Clustering

What makes docs “related”?

Recall doc as vector

Intuition

Cosine similarity

How Many Clusters?

Clustering Algorithms

Dendrogram: Example

Dendrogram: Document Example

Agglomerative clustering

“Closest pair” of clusters

Definition of Cluster Similarity

Key notion: cluster representative

Centroid

Seite 35

Outliers in centroid computation

Medoid As Cluster Representative

Example: n=6, k=3, closest pair of centroids

Issues

Exercise

“Using approximations”

Different algorithm: k-means

Basic iteration

Iteration example

Seite 45

k-Means Clustering: Initialization

Termination conditions

Convergence

Seite 49

Convergence of K-Means

k not specified in advance

Seite 52

Penalize lots of clusters

Back to agglomerative clustering

Seite 55

Clustering vs Classification

Seite 57

Seite 58

Decision boundaries

Deciding what a new doc is about

Setup for Classification

Supervised vs. unsupervised learning

Which is better?

Summary

Seite 65