| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 01 |
4/21 |
Boolean retrieval |
pdf
html |
students
instructors
|
|
| | | | | |
information retrieval links
|
| | | | | |
search Shakespeare
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 02 | 4/27 | Term vocabulary&postings lists |
pdf
html |
students
instructors
|
|
| | |
|
|
|
Porter stemmer
|
| | |
|
|
|
credit
card number searches disabled
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 03 | 4/28 | Dictionaries & tolerant retrieval |
pdf
html |
students
instructors
|
|
| | |
|
|
|
trie vs hash vs ternary tree
|
| | |
|
|
|
soundex demo
|
| | |
|
|
|
edit distance example
|
| | |
|
|
|
edit distance demo
|
| | |
|
|
|
P. Norvig's spell corrector
|
| | |
|
|
|
wild
card search
on Google
|
| | |
|
|
|
spelling correction
gone wrong
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 04 | 5/5 | Index construction |
pdf
html |
students
instructors
|
|
|
|
|
|
|
|
MapReduce
paper
|
|
|
|
|
|
|
SPIMI
paper
|
|
|
|
|
|
|
Google
data center tour
|
|
|
Assignment 1
|
|
|
|
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 05 | 5/11 | Index compression |
pdf
html |
students
instructors
|
|
| | |
|
|
|
variable
byte codes
|
| | |
|
|
|
word-aligned
binary codes
|
| | |
|
|
|
pos/freq compression
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 06 | 5/12 | Scores, weights,
vector spaces |
pdf
html |
students
instructors
|
|
| | |
|
|
|
vector
space for dummies
|
| | |
|
|
|
exploring
the
similarity space
|
| | |
|
|
|
Okapi
BM25
|
|
|
Solution assignment 1
|
|
|
|
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 07 | 5/19 | Computing scores |
pdf
html |
students
instructors
|
|
|
|
|
|
|
|
how
Google tweaks ranking
|
|
|
|
|
|
|
interview
with Google's Udi Manber
|
|
|
|
|
|
|
Amit Singhal on Google ranking
|
|
|
|
|
|
|
SEO perspective: ranking factors
|
|
|
|
|
|
|
Yahoo
BOSS:
opening up search
|
|
|
|
|
|
|
Compare
Google/Yahoo rankings
|
|
|
|
|
|
|
eye tracking at Google
|
|
|
Assignment 2
|
|
|
|
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 08 | 5/25 | Evaluation & result
summaries |
pdf
html |
students
instructors
|
|
|
|
|
|
|
|
TREC at NIST
|
|
|
|
|
|
|
v.
Rijsbergen's
definition of F
|
|
|
|
|
|
|
A/B testing
|
|
|
|
|
|
|
too
much A/B testing?
|
|
|
|
|
|
|
early paper on dynamic summaries
|
|
|
|
|
|
|
search quality evaluation at Google
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 09 | 5/26 | Rel. feedback, query
expansion |
pdf
html |
students
instructors
|
|
|
|
|
|
|
original
relevance feedback paper
|
|
|
|
|
|
|
relevance
feedback at Excite
|
|
|
|
|
|
|
automatic
word sense discrimination
|
|
|
Solution assignment 2
|
|
|
|
|
|
|
Assignment 3
|
|
|
|
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 13 | 6/9 | Text
classification, Naive Bayes |
pdf
html |
students
instructors
|
|
|
|
|
|
|
Calais
2.0: Semantic tagging
|
|
|
|
|
|
|
Weka (includes
Naive Bayes)
|
|
|
|
|
|
|
Reuters-21578
|
|
|
Solution assignment 3
|
|
|
|
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 14 | 6/15 | Vector space classification |
pdf
html |
students
instructors
|
|
|
|
|
|
|
perceptron example
|
| | | | | |
TC overview by Sebastiani
|
| | | | | |
FSNLP (decision trees, perceptrons)
|
| | | | | |
The elements of statistical learning
|
|
|
Assignment 4
|
|
|
|
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 16 | 6/16 | Flat clustering |
pdf
html |
students
instructors
|
|
|
|
|
|
|
K-means example
|
| | | | | |
v. Rijsbergen: the Cluster Hypothesis
|
| | | | | |
SR clustering: Clusty
|
| | | | | |
SR clustering: Carrot2
|
| | | | | |
SR clustering: Bing
|
| | | | | |
# clusterings: Stirling number
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 17 | 6/22 | Hierarchical
clustering |
pdf
html |
students
instructors
|
| | | | | |
GoogleNews precursor: Newsblaster
|
| | | | | |
Bisecting K-means
|
| | | | | |
PDDP algorithm
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 21 | 6/23 | Link analysis |
pdf
html |
students
instructors
|
|
|
|
|
|
|
|
more
on PageRank math
|
| | | | | |
Jon Kleinberg (inventor of HITS)
|
| | | | | |
PageRank according to Google
|
| | | | | |
Google bomb (January 2008)
|
| | | | | |
defused Google bomb (June 2009)
|
|
|
Solution assignment 4
|
|
|
|
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 12 | 6/30 | Language
models for IR |
pdf
html |
students
instructors
|
| |
|
|
|
|
Ponte & Croft paper
on LMs in IR
|
| |
|
|
|
|
Lemur Toolkit
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 19 | 7/7 | Web search basics |
pdf
html |
students
instructors
|
|
| | |
|
|
|
Hal
Varian explains Google auctions
|
| | |
|
|
|
how
ads are priced
|
|
|
|
|
|
|
Geico search ca. 2004 |
| | | | | |
Louis Vuitton counterfeiting case
|
|
|
|
|
|
|
geo-targeted ad |
|
|
|
|
|
|
size
of the web in 2007 |
| | | | | |
size of the web in 2008
|
| | | | | |
ad monitoring at Google
|
| | | | | |
fighting webspam
|
| | | | | |
adversarial IR
|
|
|
Assignment 5
|
|
|
|
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 20 | 7/14 | Crawling |
pdf
html |
students
instructors
|
|
| | | | | |
Mercator web crawler
|
| | | | | |
robots.txt standard
|
| | | | | |
Google data centers
|
|
|
Solution assignment 5
|
|
|
|
|
| |
| |
date |
topic |
chapter |
slides for ... |
|
resources |
| IIR 18 | 7/21 | Latent
semantic indexing |
pdf
html |
students
instructors
|
| | | | | |
Original LSI paper
|
| | | | | |
Probabilistic LSI
|
| | | | | |
Dimensions of meaning: LSI for words
|