Search DICKENS corpus with CQP query. |
Online Help |
The display window is split into three frames. In the command frame at the top you can enter a corpus query and set some options. Once the query has been executed, the matchings strings will be shown in the result frame, which occupies the largest part of the window. The context frame at the bottom is sometimes used to display additional information.
The command frame: Type a CQP query into the text field at the top left of the command frame. A detailed description of the query language and copious examples can be found in the CQP Tutorial (online version). There are also some examples at the bottom of this help page, as well as an overview of the Penn Treebank tagset on a separate page. The sort menu to the right of the query field controls the order in which the query results will be displayed (unsorted = in corpus order; all other options refer to alphabetical ordering). The display options and keyword controls beneath the query field are described below, as are the buttons for frequency distributions. Press the Run Query button to execute the query. (NB: Query results are limited in size and the corpus search will stop after the first 50000 matches.)
The result frame: When a query has been executed, the matching strings are displayed
together with some context (usually a complete sentence). The matching string itself is printed in bold face
and highlighted with a yellow background. Each match is preceded by a header line showing the match
number at the left, followed by the title of the novel and the chapter number (if available). Clicking on the
context link in the left margin will display a larger
amount of context in the context frame. When there are more than 20 matches, they are shown
in pages of 20 items each. The navigation bars at the top left and bottom left allow you to step
through the individual pages. The
and
buttons
jump back and forth by an entire page (20 matches), respectively, while
and
jump back and forth by half a page (10 matches). Click on
to go back to the first page and
to jump to the last page.
You can also select a page from the drop down menu in the middle of the navigation bar
and jump directly to this page by clicking the Go button.
The display options allow you to customise the information that is shown in the result frame. Note that changes in the display options only take effect when the query is re-run (query results are cached, so they can be re-displayed immediately). Alternatively, you can set display options using the menus in the top right or bottom right corner of the result frame and activate the settings by clicking the Apply button (changes made here will be undone when a new query is executed). The tokens menu selects a display style for word-level annotations. Part-of-speech codes can be added to word forms or lemmata and are shown in grey. The SARA style option shows word forms tokens coloured according to their part of speech (nouns in blue, adjectives in green, determiners in orange, and verbs in red). The phrases menu presents a choice of display styles for phrase structure annotations, all using square brackets to indicate the start ([) and end (]) of a noun phrase (NP) or prepositional phrase (PP). In the brackets mode, only maximal phrases are shown, with NPs in bold face. In the labelled mode, NPs and PPs are distinguished by a subscript on the opening bracket. Bold face now indicates maximal regions, and the nesting level of embedded phrases is shown in the subscript. In the colour-coded modes, blue brackets mark a [noun phrase] and green brackets a [prepositional phrase]. Place the mouse cursor over the opening bracket to disaply the head lemma annotation (if your browser supports tool tips). The colour + head mode adds the phrase heads as subscripts on the opening bracket. Note that due to technical problems with the parser, there are no phrase structure annotations for long sentences. If you have a better English parser and would be willing to annotate the demo corpus for us, please contact Stefan Evert. The context menu determines the amount of context shown around the matching strings. By default, the context consists of the full sentence containing the match (sentence). It can be extended to include the preceding and following sentence (2 sentences) or to a full paragraph. Alternatively, the 5 words context produces a KWIC-like display of the immediate lexical neighbourhood of each query match.
The keyword controls: When the box at the beginning of this
line is checked, CQP will execute a set keyword command after running the query.
The layout was designed to mirror the corresponding syntax in the CQP query language (see the
CQP Tutorial
for details). As an example, search for a noun in the corpus and make the following keyword settings:
(1) check the keyword box; (2) leave the first menu at nearest; (3) type pos="VB[DPZ]?"
into the text field; (4) leave the next menu blank; (5) set the following menu to 1 s; (6)
leave the last menu at match and the final box unchecked. When you click the Run Query
button, the verb form (except for participles and gerunds) closest to each instance of the noun will
be underlined in red.
Only verbs within the same sentence are considered in the search operation.
Match frequencies: When you click the Frequencies button instead of Run Query, a list of unique matching strings and their frequencies will be displayed in the result frame. The ordering of this list can be controlled with the sort menu. Click on any of the strings to show the corresponding matches in the context frame (you will find a navigation bar and display options at the bottom of this frame).
Corpus distribution: Click on the Distribution button to show the distribution of query matches across the novels in the collection. The (horizontal) length of each bar is proportional to the relative frequency of the matching strings in the respective novel (expressed in ppm = parts per million, i.e. number of instances among a million words of text). The (vertical) width of the bar is chosen so that its total area is proportional to the absolute frequency of matches in the novel. Clicking on one of the labels will switch the result frame to display the corresponding matches. Press your browser's Back button to get back to the distribution window.
Example queries: You can copy & paste these queries into the text field in the top left corner of the command frame.
"gentleman";
[lemma = "boy"]
"Oliver"%c "Twist"%c
"as" [pos="JJ.*"] "as" []+
<pp_h "from"> []* </pp_h> <pp_h "to"> []* </pp_h>
Most of the English query examples from the CQP Tutorial will also work with the online demo.