InhaltsverzeichnisSeite 1
Today’s Topic: Clustering 1
Restaurant recommendations
Input
Algorithm 0
Another look at the input - a matrix
Now that we have a matrix
Similarity between two people
Algorithm 1.1
Algorithm 1.k
Slightly more sophisticated attempt
How do you cluster?
Why cluster documents?
Improving search recall
Speeding up vector space retrieval
Seite 16
Clustering for UI (1) Corpus analysis/navigation
Clustering for UI (2) Navigating search results
Results list clustering example
Search Engine Example: Vivisimo
Representation for Clustering
What makes docs “related”?
Recall doc as vector
Intuition
Cosine similarity
How Many Clusters?
Clustering Algorithms
Dendrogram: Example
Dendrogram: Document Example
Agglomerative clustering
“Closest pair” of clusters
Definition of Cluster Similarity
Key notion: cluster representative
Centroid
Seite 35
Outliers in centroid computation
Medoid As Cluster Representative
Example: n=6, k=3, closest pair of centroids
Issues
Exercise
“Using approximations”
Different algorithm: k-means
Basic iteration
Iteration example
Seite 45
k-Means Clustering: Initialization
Termination conditions
Convergence
Seite 49
Convergence of K-Means
k not specified in advance
Seite 52
Penalize lots of clusters
Back to agglomerative clustering
Seite 55
Clustering vs Classification
Seite 57
Seite 58
Decision boundaries
Deciding what a new doc is about
Setup for Classification
Supervised vs. unsupervised learning
Which is better?
Summary
Seite 65
|