Introduction to Information Retrieval

Sommersemester 2009, Hinrich Schütze, Mo 15:45-17:15 (0.108), Tu 15:45-17:15 (V38.03)

Textbook

IIR: Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Cambridge University Press, 2008. Web publication at http://informationretrieval.org

Assignments

assignment 1   solutions
assignment 2 solutions
assignment 3 solutions
assignment 4 solutions
assignment 5 solutions

Slides

beamer style files

IIR 01 Boolean retrieval: slide presentation, slides for printing, slide source

IIR 02 The term vocabulary & postings lists: slide presentation, slides for printing, slide source

IIR 03 Dictionaries and tolerant retrieval: slide presentation, slides for printing, slide source

IIR 04 Index construction: slide presentation, slides for printing, slide source

IIR 05 Index compression: slide presentation, slides for printing, slide source

IIR 06 Scoring, weighting, vector spaces: slide presentation, slides for printing, slide source

IIR 07 Computing scores in a complete search system: slide presentation, slides for printing, slide source

IIR 08 Evaluation & result summaries: slide presentation, slides for printing, slide source

IIR 09 Relevance feedback & query expansion: slide presentation, slides for printing, slide source

IIR 12 Language models: slide presentation, slides for printing, slide source

IIR 13 Text classification & Naive Bayes: slide presentation, slides for printing, slide source

IIR 14 Vector space classification: slide presentation, slides for printing, slide source

IIR 16 Flat clustering: slide presentation, slides for printing, slide source

IIR 17 Hierarchical clustering: slide presentation, slides for printing, slide source

IIR 18 Latent semantic indexing: slide presentation, slides for printing, slide source

IIR 19 Web search basics: slide presentation, slides for printing, slide source

IIR 20 Crawling: slide presentation, slides for printing, slide source

IIR 21 Link analysis: slide presentation, slides for printing, slide source

Schedule

 
             date   topic chapter      slides for ... resources
IIR 01 4/21 Boolean retrieval pdf html students instructors
information retrieval links
search Shakespeare
 
             date   topic chapter      slides for ... resources
IIR 024/27 Term vocabulary&postings lists   pdf html students instructors
Porter stemmer
credit card number searches disabled
 
             date   topic chapter      slides for ... resources
IIR 034/28 Dictionaries & tolerant retrieval pdf html students instructors
trie vs hash vs ternary tree
soundex demo
edit distance example
edit distance demo
P. Norvig's spell corrector
wild card search on Google
spelling correction gone wrong
 
             date   topic chapter      slides for ... resources
IIR 045/5Index construction pdf html students instructors
MapReduce paper
SPIMI paper
Google data center tour
Assignment 1
 
             date   topic chapter      slides for ... resources
IIR 055/11Index compression pdf html students instructors
variable byte codes
word-aligned binary codes
pos/freq compression
 
             date   topic chapter      slides for ... resources
IIR 065/12Scores, weights, vector spaces pdf html students instructors
vector space for dummies
exploring the similarity space
Okapi BM25
Solution assignment 1
 
             date   topic chapter      slides for ... resources
IIR 075/19Computing scores pdf html students instructors
how Google tweaks ranking
interview with Google's Udi Manber
Amit Singhal on Google ranking
SEO perspective: ranking factors
Yahoo BOSS: opening up search
Compare Google/Yahoo rankings
eye tracking at Google
Assignment 2
 
             date   topic chapter      slides for ... resources
IIR 085/25Evaluation & result summaries pdf html students instructors
TREC at NIST
v. Rijsbergen's definition of F
A/B testing
too much A/B testing?
early paper on dynamic summaries
search quality evaluation at Google
 
             date   topic chapter      slides for ... resources
IIR 095/26Rel. feedback, query expansion   pdf html students instructors
original relevance feedback paper
relevance feedback at Excite
automatic word sense discrimination
Solution assignment 2
Assignment 3
 
             date   topic chapter      slides for ... resources
IIR 136/9Text classification, Naive Bayes pdf html students instructors
Calais 2.0: Semantic tagging
Weka (includes Naive Bayes)
Reuters-21578
Solution assignment 3
 
             date   topic chapter      slides for ... resources
IIR 146/15Vector space classification pdf html students instructors
perceptron example
TC overview by Sebastiani
FSNLP (decision trees, perceptrons)
The elements of statistical learning
Assignment 4
 
             date   topic chapter      slides for ... resources
IIR 166/16Flat clustering pdf html students instructors
K-means example
v. Rijsbergen: the Cluster Hypothesis
SR clustering: Clusty
SR clustering: Carrot2
SR clustering: Bing
# clusterings: Stirling number
 
             date   topic chapter      slides for ... resources
IIR 176/22Hierarchical clustering pdf html students instructors
GoogleNews precursor: Newsblaster
Bisecting K-means
PDDP algorithm
 
             date   topic chapter      slides for ... resources
IIR 216/23Link analysis pdf html students instructors
more on PageRank math
Jon Kleinberg (inventor of HITS)
PageRank according to Google
Google bomb (January 2008)
defused Google bomb (June 2009)
Solution assignment 4
 
             date   topic chapter      slides for ... resources
IIR 126/30Language models for IR pdf html students instructors
Ponte & Croft paper on LMs in IR
Lemur Toolkit
 
             date   topic chapter      slides for ... resources
IIR 197/7Web search basics pdf html students instructors
Hal Varian explains Google auctions
how ads are priced
Geico search ca. 2004
Louis Vuitton counterfeiting case
geo-targeted ad
size of the web in 2007
size of the web in 2008
ad monitoring at Google
fighting webspam
adversarial IR
Assignment 5
 
             date   topic chapter      slides for ... resources
IIR 207/14Crawling pdf html students instructors
Mercator web crawler
robots.txt standard
Google data centers
Solution assignment 5
 
             date   topic chapter      slides for ... resources
IIR 187/21Latent semantic indexing pdf html students instructors
Original LSI paper
Probabilistic LSI
Dimensions of meaning: LSI for words

Administrativa

Administrativa