Probabilistic Linguistic Models
Hinrich Schütze
Date: 2005-02-22
Much has been written about the need to combine symbolic and
statistical methods in linguistics. We wholeheartedly agree
with this view. However, a change of perspective is
suggested in this proposal. Instead of viewing a theory with
both symbolic and statistical elements as a combination, it
is of interest to point out that every probabilistic model
has a symbolic core. This symbolic core can be simple
as in the case of Markov models where it corresponds to a
directed graph with n times n edges where
n is the number of
nodes. It is more complex for a probabilistic context-free
grammar (PCFG) - in that case it is a (symbolic) context-free
grammar. But all probabilistic models have such a symbolic
core. The alternative view then is that we should
consider linguistic models that ``probabilize'' some of their symbolic
components. This would mean that linguistic models come in two
guises: traditional symbolic models and ``enhanced'' models
that probabilize some of their components. This proposal
argues that this is a promising area of linguistic research
and offers a rich set of topics to prospective graduate
students.
For simplicity, I will refer to models that are partially
probabilized as ``probabilistic models''. It would be
clearer (but too cumbersome) to call them ``symbolic models
which probabilize at least one part of their symbolic core.''
Historically, the only linguistic subdiscipline that took
probabilistic models seriously was quantitative
sociolinguistics (Labov 1994). Explanations were probabilistic
(for example, the deletion rates of certain consonants in
the pronunciation of English words) and were empirically
validated against data sets compiled in extensive field
work. It is true that the models used were mathematically
simplistic and limited to mostly phonetic
variation. But there is much linguistic insight in this
early work and kudos must be paid to a group of researchers
that stuck to its methodology even though it was frowned upon
by the mainstream.
Language variation was resistant to the
anti-probabilistic bias of mainstream linguistics because
variation is hard to explain with a purely symbolic
theory. To some extent, variation shares this characteristic
with three other subdisciplines: language acquisition,
historical change and typology.
In these subdisciplines
language often moves gradually from one state to another - or
related states coexist. Symbolic theories posit two
distinct states to explain these phenomena.
An example is
that at some
point English (or the English of individual speakers)
switched from a state of analyzing ``he is going to drink a
beer'' as a statement about a walking action to a state
where it was an
expression of future tense. Or children at some point switch
from a state in which ``I explained him the game'' is
grammatically correct
to a state where it is not.
Still another example is
the constraint that
topics precede foci. It is
grammatically mandatory in some languages, but coexists as a weaker regularity
with other "laws"
in languages like English (where it surfaces, for example, in the locative construction).
It is descriptively adequate to posit
two distinct states and a mechanism that switches between
them. But there is a potential here for more explanatory
linguistic models - probabilistic models that unify the two
states and explain a state transition as a change of
parameters within the same model. Four examples of linguistic explanations along these lines, selected
more or less arbitrarily, follow.
Bybee (2001) explains several observations about
liaison in French using exemplar theory. In exemplar theory,
all linguistic material is stored
permanently in form of exemplars
after it has been either produced or
perceived. Production and perception then use exemplars to
process language. For example, a word that has never been
produced before (e.g., ``flume'') will be produced in analogy to the exemplars
of previously produced similar words (``gloom'', ``room'',
``floor''). Exemplar theory can model many phenomena
in French liaison. Liaisons between frequent words have
persisted longer than liaisons between infrequent words. If
one of two variants of a word becomes extinct, then it is
the one that is less frequent (word initial ``z'' must be pronounced for
``(z)yeux'', word final ``t'' must be omitted for
``apparaissant'').
Exemplar theory formalizes the tension between
(1) generalizations that have a tendency to spread (the number
of silent word-final consonants increases over time) and
(2) cases that resist the generalization either permanently (the
``z'' in ``(z)yeux'') or temporarily (frequent constructions
(e.g., with object pronouns) maintain liaison).
This resistance can be explained by the basic mechanism of
exemplar theory: a dense cloud of similar exemplars protects
against the encroachment of spreading change. If there is no
such dense cloud, the change is predicted to take effect.
Haspelmath (2004) explains the universal that
extroverted verbs express reflexives in a more
complex way than introverted verbs by the principle of
economic motivation. If one applies shaving to oneself (and introverted verb), then
this is expected and can be expressed with an unstressed particle
if reflexivity is expressed at all. On the other hand, hating (an extroverted verb) is
usually not ``self-directed'', so reflexivity needs to be
expressed clearly and with more phonological material.
Haspelmath suggests that this type of expectation is a fact about the world
and proposes a frequentist account (a particular type of probabilistic model).
In the tradition of Rumelhart and McClelland (1986),
Schütze (1997) shows that probabilistic linguistic models
can explain the acquisition of complex English
subcategorization frames. He gives evidence that
non-probabilistic models such as the one proposed by
Pinker (1989) are not explanatorily adequate. The idea
behind the learning model is that learning captures broad
generalizations first, which may be initially applied too
broadly, resulting in overgeneralization. Exceptions are
learned subsequently. Negative evidence is implicit in a
combination of frequency and model fit.
Simplifying somewhat, there are two types of negative
evidence. Negative evidence for frequent items consists of
simple absence. The frequent verb ``explain'' does not
participate in the dative alternation because we don't
experience it in this construction. Negative evidence for
infrequent items consists of absence in the class. The
infrequent verb ``bond'' does not form the past tense ``bont''
in analogy to ``sent'' because this pattern is absent from
the class of English verbs that ``bond'' is a member of.
Some of the specific examples discussed here concern the language as a
whole, not the language of an individual. But in each case
there are examples of the same phenomenon that apply to an individual
speaker. For example, variation in liaison occcurs in
individuals as well as historically.
Most current probabilistic accounts
emphasize the explanatory importance of
frequency. They are really frequentist models (a simple
subclass of probabilistic models) and often lack
mathematical sophistication.
There is a great opportunity here to improve the state of
the art by applying better models and more rigor.
This process has started in exemplar theory
(Kirchner, 1999; Pierrehumbert, 2001), but is at an early stage.
Optimality theory has some of the same goals as
probabilistic linguistics. One can view optimality theory as a framework
that allows the statement of broad, explanatory
generalizations and arbitration between them when they
conflict. Similarly, probabilistic models capture broad
generalizations in their symbolic core and use the apparatus
of probability theory to mediate between them.
Many phenomena receive an explanation in optimality theory
that is elegant and concise and probabilistic models may not
be able to improve on these accounts.
However, there is also a large class of phenomena that
cannot be explained well in standard optimality theory.
First, there are often several
acceptable linguistic forms, not just one. Optimality theory
relies on ranking and loses much of its appeal when
we replace the simple winner-take-all approach with
something more complex that allows several winners.
Equally important is the phenomenon of ``ganging up''. The
violation of several lesser constraints is often worse than
the violation of one big constraint. Optimality theory
in its current form cannot formalize this. See
Manning (2002) for discussion.
These two limitations of classical optimality theory, its
winner take all property and its difficulty in formalizing
"ganging up", don't apply to stochastic OT. Stochastic OT
(Boersma 1998)
and other non-standard OT variants (Smolensky et al 2005, Keller 2002) are
probabilistic linguistic models in the sense that the term
is used here. Extensions of optimality theory that allow for
more flexible interaction of constraints (more flexible than
ranking) seem particularly promising for linguistic theories that can explain phenomena not yet explained by current theories.
2.3. Explanatory adequacy of probabilistic models
In this section, we review some common arguments against
the theoretical adequacy of probabilistic models for
language.
The most famous and most infamous argument against probabilistic
models is Chomsky's. Chomsky argued
that there are both grammatical and ungrammatical
sentences that we have never seen. If grammaticality is a
function of how frequently a sentence occurs, then a
probabilistic model cannot distinguish between grammatical
and ungrammatical sentences
(Chomsky, 1957). This is a convincing argument against a
particular type of probabilistic model, one that estimates
the probability of a sentence as its relative frequency. But
it is not an argument against other types of models. For a
more general class of models, Markov models,
Chomsky showed that they also do not model
language correctly. Again, this means that Markov models are
not adequate and is not an argument against probabilistic
linguistics in general. For example, the argument has nothing to say
about PCFGs (although it is hard to argue that PCFGs are
linguistically explanatory).
See Abney (1996) for a discussion of
Chomsky's arguments.
Statistical models are sometimes accused of not being
explanatory because any data set can be fitted by either fiddling with
the parameters of the model or by declaring counterexamples to be
exceptions. This is a valid concern. Probabilistic models have more
knobs to tweak than their symbolic equivalents -- each parameter is a
knob. And each parameter can take on an infinite number of values. So empirical
validation is definitely a challenge for probabilistic models.
But empirical validation is a challenge for other linguistic theories too. Witness this discussion between two generative grammarians:
- A: I propose theory
.
- B: Sentence
is a counterexample to
.
Therefore
is not valid.
- A: In my dialect,
is not grammatical. I therefore
maintain that
is valid.
The corresponding argument between two
probabilistic linguists is:
- A: I propose theory
.
- B: Sentence
is a counterexample to
.
Therefore
is not valid.
- A: Sentence
is an exception that has been
maintained in the language due to high frequency. It has
no bearing on
.
I therefore
maintain that
is valid.
Probabilistic theories of language make statements about
distributions and overall regularities. Individual
counterexamples cannot be used to falsify them. A debate
about the adequacy of a probabilistic linguistic model must
therefore be a debate about its symbolic core; about the way it models
learning; about the overall
distribution of data in the language under study;
and about similar properties. Just as we don't let the
generative grammarian off the hook when she appeals to her
own dialect, we shouldn't let the probabilistic linguist get
away with dismissing exceptions too easily. Empirical validation is a
hard problem in all sciences and needs to be approached with
great care.
I haven't been able to find this argument in print, but many
linguists are uncomfortable with including numbers in a
linguistic theory. There is less resistance to integers
because of the success of optimality theory.
Ranking is an important explanatory device
and it is equivalent in representational power to
integers. But rationals
and reals are met with great suspicion. I've even heard the
argument that rationals are more acceptable than reals
because rationals can be viewed as ratios of integers
whereas no such reduction to integers is possible
for reals.
Perhaps the perceived problem is that counting and integers
are natural parts of language (e.g., every language has
words for them), but rationals and reals are not. But this
can only be an argument about the subject of our research in
linguistic science (the languages we study like English and Chinese), not about
our scientific metalanguage. Be that as it may, there is no
scientific basis for a priori exclusion of numbers from
linguistic explanations.
There are many probabilistic investigations of language that
are descriptively oriented. For example, Zipf's law states
that, on a logarithmic scale,
the rank of a word is correlated with the inverse of
its frequency (where words are ranked according to
frequency). It is not clear whether this "law"
explains anything or whether it is in turn in need of
explanation. In fact, Zipf proposed an informal probabilistic
account of his law. But Zipf's law, it its descriptive form, is often regarded as
a typical example of probabilistic linguistics.
Again, this objection to probabilistic explanations in
language is a valid objection to particular instances, like
Zipf's law. But it does not apply to probabilistic
explanations in general, for example, to those in Section 2.1.
Some
scientific subcommunities espouse linguistic
behaviorism. By behaviorism I mean a theory of language
that attempts to explain linguistic phenomena by
stimulus-response learning (Pulvermüller, 1999).
Anybody who understands language at the level of the average
linguist cannot agree with this view. Language cannot be
equated with salivation. In general, probabilistic
linguistic theories involve some kind of learning that may
look like stimulus-response learning to the
uninitiated. Hence the suspicion that theories with
probabilistic or statistical elements are behaviorist and
cannot be accepted.
The fallacy here is that, by definition, learning involves
stimuli and responses. Perhaps memorization is a form of
learning that merely
collects stimuli, but pure memorization is not
learning if there is no potential of acting on
the memories. So the distinction between behaviorist and
non-behaviorist learning is not that one involves stimuli
and responses and the other does not. The difference is
that non-behaviorist theories admit to the possibility that there is
some form of prior (or innate) knowledge about the learning
problem - the ``bias'' as it is technically called in
machine learning. Behaviorism rejects strong ``nativism''. It
denies the
existence of complex innate knowledge or at least deems its study unscientific. The behaviorist
learner is an association machine that associates entities
with other entities without any active intervention. (This
is somewhat of a caricature, but one that is not too far
from the truth.)
The debate about the nature of innate human knowledge is
largely independent of the position one takes on
probabilistic vs. non-probabilistic models. If anything, the
debate is likely to be more informed among practitioners of
probabilistic linguistics since they learned in Introductory
Machine
Learning that learning without bias is impossible. So the
question is not:
Is there a bias, yes or no?
It must be: What is the bias (the innate knowledge)?
In summary, behaviorist models can be viewed as a
subclass of probabilistic models, but in general
probabilistic models are not behaviorist.
Most probabilistic models for language in use today have
little explanatory adequacy. It is true that fundamental assumptions are
often justified linguistically. For example, data-oriented
parsing analyzes the structure of a sentence in analogy to
the structure of known sentences, an exemplar-based approach
that is quite similar in spirit to exemplar theory (Bod et al., 2003). But it
is unclear how much data-oriented parsing has to say about
linguistic theory apart from this very important, but also
very basic insight.
There are also examples of more linguistically oriented
models, but they are far
and few between compared to the mass of work firmly anchored
in the engineering sciences.
Other uses of statistics and probability theory
as an auxiliary science can also be assigned to this
category, hypothesis testing being the most prominent
example: probability serves an important function, but it is
not
part of the theoretical apparatus.
Perhaps surprisingly, I would also question the
role of
corpus linguistics in this context. There can be no doubt
that corpus linguistics is absolutely essential for
theoretical linguistics. I would claim
that many
advances in linguistics in the last decade have been
motivated and supported by corpus-based work. One
example is the theoretical understanding of
subcategorization in English, which has evolved considerably
because of the now widespread use of corpus-based methods
(Manning, 2002). Another one is that
our theoretical understanding of the lexicon has
changed substantially, partly because of corpus-based lexical resources like
WordNet (Miller et al., 1990) and FrameNet
(Fillmore and Baker, 2001). And there are many more.
But corpus linguistics has mostly had the role
of an auxiliary science. It has not directly
contributed to theoretical advances.
My claim that
there is
a dearth of work in probabilistic
linguistics (work that is strong probabilistically as well
as linguistically.) is not an indictment of this approach. It just
means that a lot of excellent research is being done in engineering
and corpus linguistics and that it has goals other than
contributing directly to linguistic theory.
Far from being an argument against this emerging field, it suggests there is a great opportuntity for
probabilistic
linguists
to do innovative research in an area that is just beginning
to evolve.
If we define linguistics as the study of linguistic
competence and if competence is the core of grammar that has no strong
interactions with other cognitive abilities, then probabilistic
models have little to contribute to linguistics. Linguists
who view the performance-competence distinction as a central
tenet of linguistics (as opposed to a research strategy that
directs attention to a subset of linguistic phenomena that
is of particular importance) are unlikely to find much of interest
in probabilistic explanations.
The prototypical probabilistic device is a coin. We toss it
and it randomly comes up heads or tails. How can this be a
valid model of language? Clearly, the sentences we produce are not
strings of randomly selected words.
The
pedestrian defense against this view is to point out that we
mostly work with conditional probabilities. For example,
if I see a jay-walker crossing the street and a big truck
approaching, I might shout either ``Stop!'' or ``Watch
out!'' It seems plausible that there is some randomness in
which one I choose.
A more philosophical answer might be that heads or tails
depends deterministically on the way I toss the coin. A
probabilistic model is the most explanatory model of the
tossing without making any assertions about lofty concepts like free
will or determinism. I view randomness as a subjective interpretation
of the model that neither adds to nor subtracts from its
explanatory power.
Intuitions vary
considerably, but most would agree that many linguistic
phenomena are best explained non-probabilistically. In
English, the subject noun phrase precedes the verb phrase.
The most
insightful and most explanatory way of stating this
scientific fact in a theory of linguistics will
always be
S
NP
VP or a
variant thereof. If some explanations are necessarily
symbolic, isn't it a problem to have a mix of probabilistic
and symbolic explanations? Can language be both
probabilistic and symbolic?
An analogy from physics may help.
The gas law
states:
where
is the pressure,
the volume,
the number of
moles,
the gas constant, and
the temperature.
Ultimately, the gas law can be derived from the kinetics
of individual molecules. But clearly the right
level of explanation, the formulation that allows
us best to understand its main insight and also
the formulation that allows us best to make
predictions is the
level at which it is stated. There can be no more concise
and explanatory statement about the set of phenomena we want
to capture here than the six-character string ``pV=nRT''.
This physics example is an analogy on two levels.
First, it shows that there is no dichotomoy
between probabilistic and symbolic explanations.
The symbolic explanation (the "better"
explanation in this case) emerges from the
probabilistic one.
And there is nothing novel about pointing this out: Emergence of symbols and symbolic
relationships is a staple of connectionism
(Rumelhart et al., 1986; McClelland et al., 1986).
Secondly, the
ardent probabilist may be tempted to claim that
her explanation is better because it's more basic
and close to the ultimate truth of the basic laws
(in this case, molecular kinetics). But as we
know molecules consist of atoms, atoms consist of
particles etc. The idea that one level of
explanation is superior to another on
metaphysical grounds is typical of the logical
positivist program that all of science can be
axiomatized like mathematics and then derived
from axioms.
The discussion in this article is in the spirit
of Dupré who
rejects the positivist program and
accepts the diversity and disorder of the world
(Dupré, 1993).
If we let different theories coexist with each other,
then there is no a priori
reason to prefer one level of explanation to
another - or to object to mixing them for that matter. Each has to fend for itself with
arguments like explanatory adequateness and
predictiveness without the
positivist belief that there is a single
truth, a unified theory that explains all phenomena.
There is a tradition of work on probabilistic models at the
Institut für Maschinelle Sprachverarbeitung
reaching back
more than 10 years. Topics have included part of speech tagging
(Schmid, 1994), head-lexicalized PCFGs
(Carroll and Rooth, 1998), learning semantic roles from
corpora (Rooth et al., 1999; Beil et al., 1999), probabilistic
morphology (Schmid, 2005), and statistical
models of collocations (Evert, 2004). This
work was mainly concerned with solving computational
problems in applications like machine-readable dictionaries
and grammar development. Linguistic explanation usually was
a secondary goal. Still, there is rich expertise in
probabilistic models in the research group that will be
invaluable to students working on the project proposed here.
Research results in the area of child language acquisition were
discussed earlier (Schütze, 1997).
A list of topics follows. These are
merely suggestions. Any topic in the area of probabilistic
linguistics would be appropriate as long as it is
well-founded both linguistically and
mathematically. Additional topics are listed in the
following section.
- Uncertainty and incompleteness in linguistic theory.
Part of the appeal of probabilistic models for
language is that non-discreteness is unavoidable for other
parts of cognition. Much of our dealings with the world
are governed by uncertainty and incompleteness. If
language interacts closely
with other cognitive modules, then much is to be gained
from being able to model uncertainty and incompleteness as
part of a linguistic theory. For example, one could model
the interaction between uncertain knowledge about the context and
the interpretation of a sentence uttered in this context.
- Language acquisition, language change, language variation, typology.
The accounts discussed in the State of the Art section are
all promising topics for a dissertation. A possible focus
might be the design of a principled probabilistic model for
the phenomenon at question and its validation in
computational experiments.
- Argument/adjunct model. The argument-adjunct
distinction is one of the perennial topics of theoretical
linguistics. There clearly is a difference between phrases
that are closely bound to the verb and those that are
not. But since decades of linguistic research have not
produced a universally agreed-upon definition of these two
concepts, it may be time to try a probabilistic
account. See (Manning, 2002) for discussion.
- Optimality theory.
Optimality theory offers a rich set of possible topics in
probabilistic linguistics, ranging from adding
probabilistic components to optimality theory to a formal
characterization of linguistic phenomena that are amenable
to optimality-theoretic vs. probabilistic explanations.
The investigation of particular optimality-theoretic
analyses
(e.g.,
Bresnan (2000) and Kuhn (2003))
in a
probabilistic framework would also be promising.
- Probabilistic formalizations of grammaticality.
There are probabilistic models that formalize grammaticality as
probability. Sentences with high probability are
grammatical, sentences with extremely low probability are
ungrammatical. But what is this probability the probability of? It
cannot just be the probability of the syntactic structure
of the sentence - semantics also plays a role since
sentences are sometimes judged ungrammatical because the
grammatical reading is so unusual that it is
inaccessible. The probability cannot be the
probability of the state of affairs described either:
Contradictory sentences can be perfectly grammatical. It
can only be weakly correlated to
comprehensibility: non-native speakers utter perfectly
intelligible yet utterly ungrammatical sentences. A
coherent probabilistic model of grammaticality is a great
challenge and would be a great dissertation topic.
- Formal learnability. Many difficult learning
problems in non-probabilistic formalisms appear in a
different light when recast in probabilistic terms. The
reason is that
``the presence of gradients can direct learning''
(Manning, 2002) and thereby transform an unlearnable
phenomenon (which in a non-probabilistic framework often ends
up as part of universal grammar) into a learnable
one. This could be a topic in itself or one could focus on
the learnability of a particular area such as a parameter
posited for universal grammar.
In principle, the question posed in the beginning of this proposal arises
in each of the other projects in the Graduiertenkolleg: Is
this linguistic phenomenon best explained by a purely
symbolic theory or does a model that partially probabilizes
its symbolic core yield better explanations? In working with
students, preference
would be given to those linguistic
phenomena that are the subject of one of the other
projects.
The student could then contribute a probabilistic
perspective to the research conducted in the second project.
This would further collaboration of students and advisors
within the
Graduiertenkolleg. Some examples follow.
- Historical investigation of verbal semantics and
morphosyntax.
The transition of a verb's morphosyntax and
semantics from one historical state to another is a
prototypical example of the type of phenomenon that a
partially probabilized linguistic model should be able to
explain well. This consideration motivates collaboration
with Stein's project and dissertations that could draw on
expertise and support from both French linguistics and
theoretical computational linguistics.
- Semantics and pragmatics in the lexicon. As
claimed above, many cognitive phenomena involve
uncertainty and incompleteness. Semantics and pragmatics
interact with this uncertainty of the world. A possible
research topic would be to investigate probabilistic
vs. non-probabilistic models for a particular lexical
phenomenon of interest. An example that has been
investigated in DRT (Rossdeutscher and Kamp, 1994) is
semantic coercion, e.g., ``the trip to London'', ``the
plane to London'', ``the ticket to London'',
``the meal to London'', ``the apple
to London'' etc. Some of these coercions are possible
(``ticket''), some are not (``apple''), some are of
intermediate felicity. This is
a plausible area for probabilistic linguistics
to contribute to.
- Prosodic representations in language and speech.
Exemplar theory is perhaps the best example of an existing
probabilistic model that is linguistically
explanatory. Collaboration with Bernd Möbius' project
would therefore be natural.
- Large-coverage unification grammars. One
challenge in developing large-scale unification grammars
is the trade-off between coverage of the grammar and
multiplication of spurious readings.
For example, pronominalization of measure phrases in English
is stylistically questionable, but possible. Example:
``Did a moratorium on executions save innocent lives - or
cost them?''
(Jewish World Review)
In a non-probabilistic grammar, one can either
admit pronominalization and thereby increase the number of
spurious readings for other sentences; or exclude it and
make the sentence in question unparsable. A probabilistic
model has the potential of representing the markedness of
this construction in a more explanatory way. Students
working on this topic could build on log-linear models
(Riezler et al., 2002) and
optimality-theoretic devices that are already being used
at Professor Rohrer's Lehrstuhl.
I am grateful to Artemis Alexiadou, Bernd Moebius, Hans Kamp, and Jonas
Kuhn for comments on earlier drafts.
-
Steven Abney.
- Statistical methods and linguistics.
In Judith Klavans and Philip Resnik, editors, The Balancing Act:
Combining Symbolic and Statistical Approaches to Language, pages 1-26. The
MIT Press, 1996.
-
Franz Beil, Glenn Carroll, Detlef Prescher, Stefan Riezler, and Mats Rooth.
- Inside-outside estimation of a lexicalized pcfg for german.
In Proc. of ACL, 1999.
-
Rens Bod, Remko Scha, and Khalil Sima
an.
- Data-Oriented Parsing.
CSLI Publications, 2003.
-
Paul Boersma.
- Functional phonology: Formalizing the interactions between
articulatory and perceptual drives.
PhD thesis, University of Amsterdam, 1998.
-
Joan Bresnan.
- The emergence of the unmarked pronoun.
In Geraldine Legendre, Sten Vikner, and Jane Grimshaw, editors, Optimality-theoretic Syntax. The MIT Press, 2000.
-
Joan Bybee.
- Frequency effects on french liaison.
In Joan Bybee and Paul Hopper, editors, Frequency effects and
Emergent Grammar, pages 337-359. John Benjamins, Amsterdam, 2001.
-
Glen Carroll and Mats Rooth.
- Valence induction with a head-lexicalized PCFG.
In Proc. of EMNLP, Granada, Spain, 1998.
-
Noam Chomsky.
- Syntactic Structures.
Mouton, The Hague, 1957.
-
John Dupré.
- The Disorder of Things.
Harvard University Press, 1993.
-
Stefan Evert.
- The statistical analysis of morphosyntactic distributions.
In Proc. of LREC, pages 1539-1542, Lisbon, Portugal, 2004.
-
Charles J. Fillmore and Collin F. Baker.
- Frame semantics for text understanding.
In Proc. of WordNet and Other Lexical Resources Workshop,
NAACL, 2001.
-
Martin Haspelmath.
- A frequentist explanation of some universals of reflexive marking.
Handout, 2004.
- Frank Keller and Ash Asudeh.
- Probabilistic learning algorithms and optimality theory.
- Linguistic Inquiry , 33(2):225-244, 2002.
-
Robert Kirchner.
- Preliminary thoughts on phonologization within an exemplar-based
speech processing system.
Technical report, UCLA Working Papers in Linguistics, Los Angeles CA,
1999.
-
Jonas Kuhn.
- Optimality-Theoretic Syntax - A Declarative Approach.
CSLI Publications, 2003.
-
William Labov.
- Principles of linguistic change. Volume 1: Internal factors.
.
Blackwell, 1994.
-
Chris Manning.
- Probabilistic syntax.
In Rens Bod, Jennifer Hay, and Stefanie Jannedy, editors, Probabilistic Linguistics. MIT Press, Cambridge MA, 2002.
-
James L. McClelland, David E. Rumelhart, and the PDP Research Group, editors.
- Parallel Distributed Processing. Explorations in the
Microstructure of Cognition. Volume 2: Psychological and Biological Models.
The MIT Press, Cambridge, MA, 1986.
-
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and
Katherine J. Miller.
- Introduction to WordNet: An on-line lexical database.
Journal of Lexicography, 3 (4): 235-244,
1990.
-
Janet Pierrehumbert.
- Exemplar dynamics: Word frequency, lenition, and contrast.
In Joan Bybee and Paul Hopper, editors, Frequency effects and
Emergent Grammar, pages 137-157. John Benjamins, Amsterdam, 2001.
-
Steven Pinker.
- Learnability and Cognition.
The MIT Press, Cambridge MA, 1989.
-
Friedemann Pulvermüller.
- Words in brain s language.
Behavioral and Brain Science, 22: 253-336, 1999.
-
Stefan Riezler, Tracy H. King, Ronald M. Kaplan, Richard S. Crouch, John
T. Maxwell III, and Mark Johnson.
- Parsing the wall street journal using a lexical-functional grammar
and discriminative estimation techniques.
In ACL, pages 271-278, 2002.
-
Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Carroll, and Franz Beil.
- Inducing a semantically annotated lexicon via em-based clustering.
In Proc. of ACL, 1999.
-
Antje Rossdeutscher and Hans Kamp.
- Remarks on lexical structure and drs construction.
Theoretical Linguistics, 20 (2/3): 97-164,
1994.
-
D. E. Rumelhart and J. L. McClelland.
- On learning the past tenses of English verbs.
In James L. McClelland, David E. Rumelhart, and the PDP
Research Group, editors, Parallel Distributed Processing. Explorations
in the Microstructure of Cognition. Volume 2: Psychological and Biological
Models, pages 216-271. The MIT Press, Cambridge, MA, 1986.
-
David E. Rumelhart, James L. McClelland, and the PDP research group, editors.
- Parallel Distributed Processing. Explorations in the
Microstructure of Cognition. Volume 1: Foundations.
The MIT Press, Cambridge, MA, 1986.
-
Helmut Schmid.
- Probabilistic part-of-speech tagging using decision trees.
In Proc. of the International Conference on New Methods in
Language Processing (NeMLaP), pages 44-49, 1994.
-
Helmut Schmid.
- Disambiguation of morphological structure using a pcfg.
Submitted, 2005.
-
Hinrich Schütze.
- Ambiguity Resolution in Language Learning.
CSLI Publications, Stanford, CA, 1997.
-
Paul Smolensky and Geraldine Legendre.
-
The Harmonic Mind: From Neural Computation To
Optimality-Theoretic Grammar
.
MIT Press, 2005.
-
Whitney Tabor.
- Syntactic Innovation: A Connectionist Model.
PhD thesis, Stanford University, 1994.