Introduction to Information Retrieval

Sommersemester 2008, Hinrich Schütze, Mo 15:45-17:15 & Tu 15:45-17:15, V38.03

Textbook

IIR: Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Cambridge University Press, 2008. Web publication at http://informationretrieval.org

Assignments

assignment 1   solutions
assignment 2 solutions
assignment 3 solutions
assignment 4 solutions
assignment 5 solutions

Slides

beamer style files

IIR 01 Boolean retrieval: slide presentation, slides for printing, slide source

IIR 02 The term vocabulary & postings lists: slide presentation, slides for printing, slide source

IIR 03 Dictionaries and tolerant retrieval: slide presentation, slides for printing, slide source

IIR 04 Index construction: slide presentation, slides for printing, slide source

IIR 05 Index compression: slide presentation, slides for printing, slide source

IIR 06 Scoring weighting, vector spaces: slide presentation, slides for printing, slide source

IIR 07 Computing scores in a complete search system: slide presentation, slides for printing, slide source

IIR 08 Evaluation & result summaries: slide presentation, slides for printing, slide source

IIR 09 Relevance feedback query expansion: slide presentation, slides for printing, slide source

IIR 13 Text classification Naive Bayes: slide presentation, slides for printing, slide source

IIR 14 Vector space classification: slide presentation, slides for printing, slide source

IIR 16 Flat clustering: slide presentation, slides for printing, slide source

IIR 17 Hierarchical clustering: slide presentation, slides for printing, slide source

IIR 19 Web search basics: slide presentation, slides for printing, slide source

IIR 19 II Web search basics II: slide presentation slides for printing

IIR 20 Crawling: slide presentation, slides for printing, slide source

IIR 21 Link analysis: slide presentation, slides for printing, slide source

Schedule

             date   topic chapter      slides for ... resources
IIR 01 4/22 Boolean retrieval pdf html students instructors
information retrieval links
IIR 024/28 The term vocabulary & postings lists pdf html students instructors
Porter stemmer
IIR 034/29 Dictionaries and tolerant retrieval pdf html students instructors
soundex demo
edit distance example
edit distance demo
P. Norvig's spell corrector
IIR 045/5Index construction pdf html students instructors
MapReduce paper
SPIMI paper
IIR 055/6Index compression pdf html students instructors
variable byte codes
word-aligned binary codes
pos/freq compression
IIR 065/20Scoring, weighting, vector spaces pdf html students instructors
vector space for dummies
exploring the similarity space
Okapi BM25
IIR 075/27Computing scores pdf html students instructors
how Google tweaks ranking
interview with Google's Udi Manber
Yahoo: opening up the search engine
Compare Google/Yahoo rankings
IIR 086/2Evaluation & result summaries pdf html students instructors
TREC at NIST
v. Rijsbergen's definition of F
A/B testing
early paper on dynamic summaries
search quality evaluation at Google
IIR 096/3Relevance feedback, query expansion   pdf html students instructors
original relevance feedback paper
relevance feedback at Excite
automatic word sense discrimination
6/9 Assignment 3, rest of IIR 09
IIR 136/10Text classification, Naive Bayes pdf html students instructors
Calais 2.0: Semantic tagging
Weka (includes Naive Bayes)
Reuters-21578
IIR 146/17Vector space classification pdf html students instructors
perceptron example
TC overview by Sebastiani
FSNLP (decision trees, perceptrons)
The elements of statistical learning
IIR 166/24Flat clustering pdf html students instructors
K-means example
v. Rijsbergen: the Cluster Hypothesis
Clusty (new name of Vivisimo)
IIR 177/1Hierarchical clustering pdf html students instructors
GoogleNews precursor: Newsblaster
Bisecting K-means
PDDP algorithm
IIR 21Link analysis pdf html students instructors
more on PageRank math
Jon Kleinberg (inventor of HITS)
PageRank according to Google
IIR 197/7Web search basics pdf html students instructors
Duplicate elimination/size estimation students instructors
how ads are priced
Geico search ca. 2004
Louis Vuitton counterfeiting case
geo-targeted ad
size of the web in 2007
size of the web in 2008
ad monitoring at Google
fighting webspam
adversarial IR
IIR 207/8Crawling pdf html students instructors
Mercator web crawler
robots.txt standard
Google data centers