Foundational Course
Departament de Traducció i Filologia
Universitat Pompeu Fabra
April 16-20, 2007

 

Introduction to Corpus Resources, Annotation and Access

 


References

Corpora and Annotation


Tokenisation


Type/Token Frequency Distributions


Part-of-Speech Tagging


Morphological Annotation


Word Distributions


Syntactic Annotation


Semantic Annotation


Word Senses:
WordNet:
Prague Dependency Treebank:
FrameNet:

* general and English: * German: * Spanish: * Japanese: * FrameNet online:
PropBank:
OntoBank / OntoNotes:
Word Sense Disambiguation and Role Labeling:

Evaluation


More Levels of Corpus Annotation


The Prague Treebank:
Rhetorical Structure Theory and the RST Discourse Treebank:
The Penn Discourse TreeBank:
Anaphora and Coreference:
Kiel Corpus of Read Speech:
MATE:
NITE:

Web as Corpus