Introduction to Information Retrieval

Sommersemester 2011
Hinrich Schütze, Thomas Müller
Mo 15:45-17:15, 0.108 (check schedule below)
Tu 15:45-17:15, V38.03 (weekly, except for 6/21)

Textbook

IIR: Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Cambridge University Press, 2008. Web publication at http://informationretrieval.org

2011 Assignments

assignment 1   solution 1
assignment 2   solution 2
assignment 3   solution 3
assignment 4   solution 4
assignment 5   solution 5
assignment 6   solution 6

2010 Assignments

assignment 1   solutions
assignment 2 solutions
assignment 3 solutions
assignment 4 solutions
assignment 5 solutions
assignment 6 solutions

2009 Assignments

assignment 1   solutions
assignment 2 solutions
assignment 3 solutions
assignment 4 solutions
assignment 5 solutions

Slides

You will need these
style files to compile the latex sources.

IIR 01 Boolean retrieval: slide presentation, slides for printing, slide source

IIR 02 The term vocabulary & postings lists: slide presentation, slides for printing, slide source

IIR 03 Dictionaries and tolerant retrieval: slide presentation, slides for printing, slide source

IIR 04 Index construction: slide presentation, slides for printing, slide source

IIR 05 Index compression: slide presentation, slides for printing, slide source

IIR 06 Scoring, weighting, vector spaces: slide presentation, slides for printing, slide source

IIR 07 Computing scores in a complete search system: slide presentation, slides for printing, slide source

IIR 08 Evaluation & result summaries: slide presentation, slides for printing, slide source

IIR 09 Relevance feedback & query expansion: slide presentation, slides for printing, slide source

IIR 10 XML retrieval: slide presentation, slides for printing, slide source

IIR 11 Probabilistic IR: slide presentation, slides for printing, slide source

IIR 12 Language models: slide presentation, slides for printing, slide source

IIR 13 Text classification & Naive Bayes: slide presentation, slides for printing, slide source

IIR 14 Vector space classification: slide presentation, slides for printing, slide source

IIR 15-1 Support vector machines: slide presentation, slides for printing, slide source

IIR 15-2 Learning to rank: slide presentation, slides for printing, slide source

IIR 16 Flat clustering: slide presentation, slides for printing, slide source

IIR 17 Hierarchical clustering: slide presentation, slides for printing, slide source

IIR 18 Latent semantic indexing: slide presentation, slides for printing, slide source

IIR 19 Web search: slide presentation, slides for printing, slide source

IIR 20 Crawling: slide presentation, slides for printing, slide source

IIR 21 Link analysis: slide presentation, slides for printing, slide source

2011 Schedule

Administrativa

 
             date   topic       resources
IIR 00 4/26 Course overview Administrativa
 
             date   topic chapter      slides for ... resources
IIR 01 5/3 Boolean retrieval pdf html students instructors
information retrieval links
search Shakespeare
 
             date   topic chapter      slides for ... resources
IIR 025/3 Term vocabulary&postings lists   pdf html instructors
Porter stemmer
credit card number searches disabled
 
             date   topic chapter      slides for ... resources
IIR 065/9Scores, weights, vector spaces pdf html students instructors
vector space for dummies
exploring the similarity space
Okapi BM25
 
             date   topic chapter      slides for ... resources
IIR 075/10Computing scores pdf html students instructors
how Google tweaks ranking
interview with Google's Udi Manber
Amit Singhal on Google ranking
SEO perspective: ranking factors
Yahoo BOSS: opening up search
compare Google/Yahoo rankings
eye tracking at Google
 
             date   topic chapter      slides for ... resources
IIR 095/17Rel. feedback, query expansion   pdf html students instructors
original relevance feedback paper
relevance feedback at Excite
related searches fail
WordSpace
automatic word sense discrimination
 
             date   topic chapter      slides for ... resources
IIR 115/23Probabilistic information retrieval pdf html students instructors
solution to in-class problem
 
             date   topic chapter      slides for ... resources
5/24Assignments
 
             date   topic chapter      slides for ... resources
IIR 135/31Text classification, Naive Bayes pdf html students instructors
OpenCalais: Semantic tagging
Weka (includes Naive Bayes)
Reuters-21578
vulgarity text classifier fail
 
             date   topic chapter      slides for ... resources
IIR 126/6Language models for IR pdf html students instructors
Ponte & Croft paper on LMs in IR
Zhai & Lafferty
Lemur Toolkit
 
             date   topic chapter      slides for ... resources
IIR 15-16/7Support vector machines pdf html students instructors
 
             date   topic chapter      slides for ... resources
6/27Assignments
 
             date   topic chapter      slides for ... resources
IIR 15-26/28Learning to rank (LTR) pdf html students instructors
Learning to rank references
Microsoft LTR datasets
 
             date   topic chapter      slides for ... resources
IIR 167/4Flat clustering pdf html students instructors
van Rijsbergen: Cluster Hypothesis
search result clustering: Yippy
search result clustering: Carrot2
search result clustering: Bing
# clusterings: Stirling number
 
             date   topic chapter      slides for ... resources
IIR 177/5Hierarchical clustering pdf html students instructors
GoogleNews precursor: Newsblaster
Bisecting K-means
PDDP algorithm
 
             date   topic chapter      slides for ... resources
IIR 187/12Latent semantic indexing pdf html students instructors
Original LSI paper
Probabilistic LSI
Dimensions of meaning: LSI for words
 
             date   topic chapter      slides for ... resources
7/18Assignments
 
             date   topic chapter      slides for ... resources
IIR 217/25Link analysis pdf html students instructors
more on PageRank math
Jon Kleinberg (inventor of HITS)
PageRank according to Google
Google bomb (January 2008)
defused Google bomb (June 2009)
 
             date   topic chapter      slides for ... resources
IIR 197/26Web information retrieval pdf html students instructors
Hal Varian explains Google auctions
how ads are priced
Geico search ca. 2004
Louis Vuitton counterfeiting case
geo-targeted ad
size of the web in 2007
size of the web in 2008
ad monitoring at Google
fighting webspam
Assignment 6
 
8/9Exam (M18.11, Azenbergstr. 18)

2010 Schedule

 
             date   topic chapter      slides for ... resources
IIR 035/3 Dictionaries & tolerant retrieval pdf html students instructors
trie vs hash vs ternary tree
soundex demo
edit distance demo
P. Norvig's spell corrector
wild card search on Google
spelling correction gone wrong
freq(misspelling)>freq(correct)
 
             date   topic chapter      slides for ... resources
IIR 045/4Index construction pdf html students instructors
MapReduce paper
SPIMI paper
Google data center tour
 
             date   topic chapter      slides for ... resources
IIR 055/10Index compression pdf html students instructors
variable byte codes
word-aligned binary codes
pos/freq compression
 
             date   topic chapter      slides for ... resources
7/5Cross-Language IR students instructors latex source
 
             date   topic chapter      slides for ... resources
IIR 107/12XML retrieval pdf html students instructors
 
             date   topic chapter      slides for ... resources
IIR 146/28Vector space classification pdf html students instructors
perceptron example
TC overview by Sebastiani
FSNLP (decision trees, perceptrons)
The elements of statistical learning

 

 

2009 Schedule

 
             date   topic chapter      slides for ... resources
IIR 085/25Evaluation & result summaries pdf html students instructors
TREC at NIST
v. Rijsbergen's definition of F
A/B testing
too much A/B testing?
early paper on dynamic summaries
search quality evaluation at Google
 
             date   topic chapter      slides for ... resources
IIR 207/14Crawling pdf html students instructors
Mercator web crawler
robots.txt standard
Google data centers