Book: The TIGERSearch query tool

IV. The TIGERSearch query tool

1. An introduction to TIGERSearch

1.1 Starting the TIGERSearch tool

The way you can start the TIGERSearch tool depends on your operating system. On Windows machines, a program group called TIGERSearch has been created during the installation - so you just have to select the TIGERSearch program in the start menu.

On Unix machines, symbolic links have been created. If your general path is set properly, you may just need to type in TIGERSearch. However, the TIGERSearch start program can always be found in the TIGERSearch installation path:

INSTALLATIONPATH/bin/TIGERSearch

Please note: Any relative path specified in a dialog window is evaluated with regard to the so-called working directory. On Unix machines this directory is defined as the TIGERSearch starting directory (i.e. the directory TIGERSearch has been started from). On Mac and Windows machines the working directory is defined as the user's home directory.

When you start the TIGERSearch tool, the TIGERSearch main window pops up (cf. screenshot). Position and size of the window are saved when leaving the tool, so the arrangement of your windows will be restored in the next TIGERSearch session.

Figure: TIGERSearch main window

The main window of TIGERSearch ist divided into two parts. On the left you find the so-called information panel. It comprises a list of all corpora (cf. subsection 2.1), some special information about the currently loaded corpus (cf. subsection 2.2, subsection 2.3 and subsection 2.4), and the user's bookmarks (cf. subsection 3.4). On the right there is the query input area which consists of a textual and a graphical query editor (cf. section 3 and section 4, respectively).

1.2 The TIGERSearch help window

The TIGERSearch User's Manual can be accessed directly within the TIGERSearch user interface. The TIGERSearch help window can be activated by pressing the Help button in the upper toolbar or selecting one of the items in the Help menu.

Figure: The TIGERSearch help window

The help window is divided into two parts: On the left there is the manual navigation area. Here you find a table of contents, an index, and a search engine. Just browse through the table of contents or through the index and click on the topic you are interested in:

Figure: Navigating through the manual

If you are looking for a special topic which you cannot find in the table of contents or in the index, you might use the search engine. Just type in your search item(s) and the search engine finds all topics in which one of the items is used:

Figure: The help window search engine

The help window toolbar comprises the following buttons: The Navigation button lets you hide/show the navigation bar. The Back and Forward buttons let you navigate through the topics you have viewed before. The Refresh button reloads a topic, and the Close button closes the help window.

Corpus queries are displayed as green-colored hyperlinks. If you click on a query hyperlink, the query text is automatically copied into the query text editor of the TIGERSearch main application.

2. Loading a corpus

2.1 Selecting a corpus

TIGERSearch corpora are organized in a hierarchical file system, i.e. related corpora are grouped in folders. To see the corpora available, select the Open tab in the information panel. Now you can browse through the corpus tree (upper-hand side of the tab), have a look at the corpus properties (lower-hand side), and finally load a corpus. The corpus properties are displayed if you mark a corpus symbol:

Figure: The TIGERSearch corpus tree

Corpus loading is activated by double-clicking the respective corpus symbol, or selecting the Open selected corpus item in the context menu (activated by right mouse click on the corpus symbol). The corpus loading process can be aborted any time by pressing the Cancel button in the corpus loading progress window:

Figure: Corpus loading progress window

Corpus hotkey

An interesting alternative to load a corpus is the so-called corpus hotkey. This is a brief list of the corpora last opened by the user (up to 15 corpora). You can view the list by pressing the hotkey button in the upper-left corner of the main window (cf. the following screenshot). If you select a corpus in the shorthand list, the corpus loading process will be started immediately.

Figure: Corpus hotkey

(Un)Succesful corpus loading

If the corpus has been successfully loaded, there will be up to two additional corpus information tabs in the information panel which are described in the following subsections (corpus bookmarks and corpus templates). The currently loaded corpus is also indicated by the corpus hotkey and by the title bar of the main window.

If there are any problems during the corpus loading process, all warning messages are stored for inspection. These messages can be displayed by clicking on the warning symbol in the corpus information tab. In the case of corpus loading problems, please check the corpus configuration in the TIGERRegistry tool.

Corpus autoload

An interesting feature is the so-called corpus autoload. If this feature has been activated (cf. Preferences item in the Options menu), the corpus opened when leaving the tool will be automatically loaded when the tool is started for the next time. By default, the autoload feature is activated.

2.2 Corpus documentation

After corpus loading, the corpus information tab is activated. It includes the following pieces of information which are presented as groups within an information tree (cf. screenshots):

General documentation

The first group comprises general corpus documentation. Pressing one of the group icons displays the corresponding document in the lower part of the tab. The so-called Summary view contains some meta information about the corpus which has been specified in the corpus process (cf. subsection 4.2, chapter VI). The Detailed view also lists all corpus features and their corresponding feature values used in the corpus. Both information pages can be printed by selecting the Print Current Documentation item in the context menu which is activated by a right-button mouse click (in the lower left information panel):

If the loaded corpus comprises corpus bookmarks or corpus templates, corresponding indication icons are also placed in the documentation group (cf. screenshot above). Corpus bookmarks and corpus templates are described in subsection 2.3 and subsection 2.4, respectively.

Edge labels

The second group contains documentation about optional edge labels and secondary edge labels. All (secondary) edge labels and an optional short description are listed. If you type in a character, the cursor jumps to the first item which begins with this character. If you double-click a (secondary) edge label, it is copied into the corpus query editor:

Figure: Edge labels / Secondary edge labels

Features

The third and fourth groups comprise the nonterminal and terminal features of the corpus. All feature values and an optional description are listed. If you type in a character, the cursor jumps to the first item which begins with this character. If you double-click a feature values, the corresponding feature-value pair is copied into the corpus query editor:

Figure: Corpus features

If a type system has been defined for a corpus feature (cf. section 8, chapter III), it is also documented. Just click on the type icon which is placed under the corpus feature icon (cf. screenshot below). The type system is presented as a type tree. If you click on a type symbol or on a feature value, the corresponding feature-value / feature-type pair is copied into the corpus query editor:

Figure: Feature types

2.3 Corpus bookmarks

The Bookmarks tab comprises the user's favourite queries. Queries can be saved as bookmarks in the query editor (cf. subsection 3.4). In the TIGERRegistry tool, such bookmarks can be linked to a corpus as the so-called corpus bookmarks (cf. subsection 4.4, chapter VI). So if the user opens a corpus, the predefined bookmarks will be available in the Bookmarks tab (cf. screenshot below).

In order to differentiate between corpus bookmarks and the user's bookmarks, corpus bookmarks are displayed green-colored. To see the bookmark's name, the corpus used by the bookmark query, and the bookmark query itself, press the bookmark icon. To copy a bookmark query into the query editor, just double-click the bookmark icon:

Figure: Corpus bookmarks

2.4 Corpus templates

If templates have been declared for the corpus, they are presented in the Templates tab. In the upper part of the tab, all templates are presented. If you press a template icon, the corresponding template name, template path, and template definition are displayed. To copy the template call into the query editor, just double-click the template icon:

Figure: Corpus templates

2.5 Exploring the corpus

After corpus loading, the corpus exploration button in the toolbar (leftmost button, right neighbour of the corpus hotkey) is activated. Pressing this button will open the TIGERGraphViewer which visualizes the corpus graphs (cf. section 7). So you can browse through the corpus without processing corpus queries. This feature is very helpful if you are not familiar with a corpus and its annotations.

Please note: You can easily switch between the TIGERSearch main window and the GraphViewer window using the shortcut buttons in the lower left corner of both windows. If you press a shortcut button, the corresponding window will be moved in front of all other windows on your desktop.

Figure: Exploration of the corpus

3. Textual query editor

3.1 Query editor basics

The corpus query editor is the central part of the TIGERSearch tool. For a description of the query language please see chapter III. The query text editor is quite sophisticated, i.e. its features include query syntax highlighting and copy and paste functionality to exchange query texts with other applications (cf. Edit menu in the context menu). Using the Undo and Redo items you can restore corpus queries that have already been submitted.

Figure: The corpus query editor

A helpful input help feature is the so-called feature popup window. If you type in a feature name followed by the equality character (e.g. pos=), the feature popup window appears. It comprises all feature values and types declared for this feature. You can browse through this list by using the cursor arrows or the page up and page down keys. If you press a character key, the cursor moves to the first item which begins with the corresponding characters:

Figure: The feature pop-up window

If you select an item (by double-clicking or pressing the return key), it is copied into the query text editor. The feature popup window can be enabled / disabled in the Input Help menu of the context menu.

3.2 Advanced query editor features

To change the editor's font size, increase or decrease the relative font size from -5 to +5 in the editor's context menu (right mouse click in the editor, item relative font size in the context menu):

Figure: Changing the editor's font size

To comment or uncomment lines in the query text, just mark the corresponding text area. Afterwards select the Insert within selected area or Remove within selected area option in the Comments menu of the context menu, respectively:

Figure: (Un)commenting query text

If you want to include a corpus query into an electronic document, you can use the Copy item of the Edit menu. However, the syntax highlighting of the query will be ignored. For this purpose we have implemented the Copy colored query feature in the Edit menu. You can copy the colored query text as an HTML or LaTeX fragment. Afterwards you can paste the clipboard content into your HTML or LaTeX editor.

3.3 Internationalization of the query editor

A common problem for applications such as TIGERSearch is the keyboard input of characters which are not included in the ISO-Latin-1 character set. If you are working with a corpus that makes uses of such characters, you should consider the following three alternatives:

Please note: Typing in Unicode characters implies that Unicode charaters can be displayed (rendered) by the software. Thus, one of the Unicode fonts supported by TIGERSearch must have been installed on your system. Please consult section 3, chapter II for instructions.

Unicode encoding

The first alternative to encode a Unicode character is to type in its hexadecimal Unicode encoding. For example, the Greek capital letter Omega is represented by \u03a9. If you have typed in the Unicode encoding, just select the Expand Unicode Encodings option in the Input Help menu of the context menu to expand the character:

Figure: Expanding Unicode encodings

The Unicode encoding will be replaced by its corresponding character (cf. screenshot below). Please remind that a Unicode font must be installed to render the character properly.

Figure: Expansion of Unicode encodings

If you are frequently working with corpora using characters outside the ISO-Latin-1 character set, you should activate the Expand automatically option in the Input Help menu of the context menu.

Input help (operating system)

On many platforms, specialized tools have been developed to type in characters outside the ISO-Latin-1 character set. These tools are usually called input methods. As e.g. Greek characters do not exist on a German keyboard, these charaters are typed in as an abbreviation. For example, the string Omega might be used as an abbreviation for the Greek character that will be automatically expanded if the abbreviation has been typed in. Please consult the manual of your operating system to find out which tools are available for your platform.

Input help (TIGERSearch)

In the TIGERSeach Project we have implemented specialized input methods for 16 European languages which can be used in the TIGERSearch query editor (cf. subsection 3.2, chapter II). To activate the TIGERSearch input methods, press the upper left corner of the TIGERSearch window (Windows: press the tiger icon) and select the last option in the corresponding menu (usually called Choose input method).

The following screenshot shows how the input method is activated on a German Windows platform. The display will look similar on different platforms.

Figure: Activating the TIGERSearch input methods (1)

Now you are asked to choose one of the supported European languages. In the following screenshot, the Greek language (modern) is chosen:

Figure: Activating the TIGERSearch input methods (2)

The input method mode has been activated. A small status window is placed in the lower right corner of the screen. This window shows which language has been chosen and whether the input method is activated or deactivated:

Figure: Input method status window

To select a different language, you can either process the input method activation procedure described before or you can switch between the languages using the F7 key. To activate or deactivate the current input method please use the F8 key.

Please note: To deactive the TIGERSearch input methods (especially to deactivate the input method status window), start the input method selection procedure again, but choose system input methods in the input method menu.

How is the input method used in the query editor? All characters that are not included in the ISO-Latin-1 character set are represented by special abbreviations. To allow the input of the Latin characters as well as the special characters side by side in one mode, we have chosen encodings conventions used in the LaTeX system. For example, the German character ä is represented as \"a which is its LaTeX encoding. So if you have chosen the German keyboard mapper and you type in the character sequence \"a, it will be automatically expanded to ä by the TIGERSearch input help system.

Please note: Of course, all German characters are included in the ISO-Latin-1 character set. However, German special characters (ä,ö,ü,ß) can only be typed in on keyboards manufactured for the German market. Otherwise, an input method for the German language is necessary in order to work with German treebanks such as the TIGER treebank.

For languages such as Greek which comprises many special characters, a side by side usage of Latin and Greek characters is not possible. In this case, most Greek characters are represented by Latin characters. For example, the capital letter Omega is represented by the Latin character V. So if you type in V in the query editor, this input string is automatically expanded as the capital letter Omega. The following screenshot illustrates how Greek characters are typed in:

Figure: Typing in Greek characters

The mapping tables for the 16 supported European languages can be found in the file europe.pdf which is placed in the doc/pdf/ subdirectory of your TIGERSearch installation. It can also be downloaded from the TIGERSearch homepage (cf. http://www.tigersearch.de).

3.4 Bookmarks

TIGERSearch provides a bookmark concept to store your favourite corpus queries. As an option, you can also store the results of your queries. This makes sense especially for queries which took a long time to evaluate. The present subsection describes how to add and open bookmarks and the concept of bookmark maintenance.

Adding a bookmark

If you would like to file a bookmark, first mark the preferred bookmark parent folder in the Bookmarks tab (cf. screenshot). Afterwards, select the Add Bookmark to Main Group item in the Bookmarks menu in the context menu of the query editor. If you do not want to specify the parent folder, just select the Add Bookmark to Main Group option.

Figure: Adding a bookmark

Next, the bookmark properties window is presented. Here you can specify the name of the bookmark and a comment that describes the bookmark. If the query has already been processed, the query results can also be stored. To confirm the bookmark properties and insert the bookmark in the specified parent folder, press the OK button:

Figure: Bookmark properties

Please note: Storing query results is a helpful feature, but it consumes hard disc memory in the user's home account. If you do not plan to reuse the results, you should better do without it.

Opening a bookmark

The user's bookmarks are presented in the Bookmarks tab in the corpus information panel. If you press a bookmark icon, the bookmark name and definition are shown in the lower part of the tab. If you double-click on a bookmark or select the Open item in the bookmark's context menu, the query text of the bookmark is copied into the query editor. If query results are also stored, they are restored and can be inspected in the GraphViewer.

Figure: Opening a bookmark

Bookmark maintenance

The bookmark maintenance is realized by context menus for bookmarks and bookmark folders (right mouse click on a bookmark folder or item, respectively). You can edit bookmark properties, delete a bookmark, cut a bookmark, and paste a bookmark which has been copied or cut before. You can also edit the properties of bookmark folders, cut/copy and paste a folder, and add a new subfolder or bookmark. In order to deallocate hard disc memory used by query results, users can mark a bookmark folder and select the Delete Results Information item in the context menu to delete the query results of all bookmark queries which are placed under this folder or any of its subfolders.

Please note: Corpus bookmarks cannot be deleted. If you like to use corpus bookmarks as a basis for defining derived queries, you might duplicate them by using the copy and paste feature.

In order to exchange bookmarks with other users, you can import and export bookmarks. Just mark the parent folder of the preferred bookmarks (e.g. the root folder to export all bookmarks) and select the Export as Bookmark File item in the context menu.

Figure: Exporting bookmarks

Now the marked bookmark folder is saved. This file can be imported by other users using the Import bookmarks file item in the context menu of the preferred parent folder. A bookmark file can also be used as a corpus bookmarks file (cf. subsection 4.4, chapter VI).

4. Graphical query editor

4.1 Introduction

The graphical query editor

The graphical query editor can be used to create queries without knowledge of the TIGERSearch query language. For example, you can create nodes with a few mouse clicks, and change values by selecting an item from the list of a pull down menu. You can also create edges between nodes, which represent dominance and precedence relations.

The graphical query editor is the result of Holger Voormann's diploma thesis ([Voormann2002]; in German).

An alternative to the textual mode

The graphical editor has been designed for beginners as well as for casual users. Power users will likely create queries faster textually than graphically. It is also possible to start constructing a query graphically, automatically generate the textual equivalent, and make some further textual edits.

Graphical vs. textual queries

Because the graphical mode has also been designed for people who want to learn the TIGERSearch query language (cf. chapter III), the graphical construction of a query follows the concept of the textual query language. For example, nodes are specified with an arbitrary number of feature-value pairs.

The graphical query editor as a preprocessor

To process a graphical query, it is necessary to convert it to a textual representation. You can see the textual mode as a plugin, which is converting your graphical query to the TIGERSearch query language (see figure below).

Figure: Before query processing, the graphical construction is converted to a textual query.

Limitations

Because of the architecture (see paragraph above), the graphical mode may, in principle, have the power of the TIGERSearch query language. However, in the current version there are some limitations:

The graphical editor supports disjunctions between feature-value pairs and feature values, but not between graph relations or any grouped parts of a query (cf. section 7, chapter III).

It is not possible to create an unspecified node (cf. subsection 4.3).

Variables, used for equality of node specifications, are not yet supported (cf. subsection 7.2, chapter III).

Templates are not yet supported (cf. section 9, chapter III).

4.2 Starting

Switch to Graphical Mode

To start creating a new graphical query you first have to select the Graphical mode tab instead of the Textual mode at the top of the query input area (see screenshot):

Figure: Switch to the Graphical Mode by clicking on the tab.

Before a graphical query can be created, a corpus must be loaded (cf. section 2).

Please note: If the corpus is closed or reloaded or another corpus is opened, the current graphical query will be lost.

Creating a graphical query

To create a graphical query, you first have to create nodes (cf. subsection 4.3). In the second step you might specify these created nodes (cf. subsection 4.4) or create edges between existing nodes (cf. subsection 4.5).

4.3 Nodes

Input field

After switching to the Graphical mode you can see the horizontally divided input field. Be sure that either the Move/Create tool or the Create tool is activated (cf. subsection 4.6). By clicking into the upper part, phrase nodes (so-called inner nodes or nonterminal nodes) will be created. A left button mouse click into the bottom area creates a token (so-called leaf nodes or terminal node).

Please note: Because of the divided input field either phrase nodes or token nodes can be created. An unspecified node cannot be created.

The different parts of a node

The different parts of a node are separated by small lines. You can disable these lines (so-called plug borders) via the context menu (cf. subsection 4.7).

Figure: A phrase node with its four plugs, the node menu, and the feature constraints area.

The four (token nodes: three) plugs are used to create edges (cf. subsection 4.5). The darker inner area contains the node specification (cf. subsection 4.4).

Node menu

Every node has a pull down menu in its upper left corner. Use this menu to delete the node and to enable or disable graph predicates (cf. subsection 7.3, chapter III).

Figure: The menu of a phrase node (left) and of a token node (right).

The (token)arity and (dis)continuous predicates are only useful for phrase nodes.

Delete Node
Deletes the node, including its feature constraints and all edges starting or ending at this node.

Root
Enables/disables the root predicate. Only one node might use the root predicate. If you specify this predicate on one node, the root predicate will be disabled on all other nodes. The root predicate is visualized by disabling the dominance top plug (all edges ending at this node will be deleted) and displaying a red triangle instead.

Arity is ...
Shows/hides an input field for the arity value below the inner node specification area.

Arity range from ... to ...
Shows/hides two input fields for the start and end arity values below the inner node specification area.

Tokenarity is ...
Shows/hides an input field for the tokenarity value below the inner node specification area.

Tokenarity range from ... to ...
Shows/hides two input fields for the start and end tokenarity values below the inner node specification area.

Continuous
Enables/disables the node predicate continuous. By enabling this predicate the predicate discontinuous will be disabled, because both predicates cannot be enabled at the same time.

Discontinuous
Enables/disables the node predicate discontinuous. By enabling this predicate the predicate continuous will be disabled, because both predicates cannot be enabled at the same time.

Arity and tokenarity

After enabling an

arity or

tokenarity predicate via the node menu, input fields are shown below the inner node specification area at the bottom of the node. One input field for entering the exact (token)arity value is shown if the predicate (token)arity is ... is selected. If (token)arity range from ... to ... is selected, two input fields are shown: the left for the start value and the right for the end value of the range.

Figure: The arity and tokenarity input fields are only shown if the predicate is enabled.

The values can be changed by either typing in a new value or by pressing the increase or decrease button, which are shown at the right of the input field when the mouse cursor is over the input field or the input field is activated (see screenshot above).

Continuous and discontinuous

Only one of the

continuous or the

discontinuous node predicates can be set. These predicates are visualized by the non-crossing edges and the crossing edges symbol at the bottom of the node.

Figure: The (dis)continuous predicate visualized by the (non-)crossing edges symbol.

4.4 Feature Constraints

Creating a feature-value pair

A node can be specified by an arbitrary number of feature-value pairs (cf. section 3, chapter III). To create a feature-value pair click into the inner node specification area. A box comprising three menus appears (see image below). If the current corpus supports more than one phrase or token feature, the upper menu (the feature selection menu) opens automatically.

Figure: Creating feature-value pairs and toggling between conjunction and disjunction.

To create another feature-value pair you have to click into the inner node specification area, left or right of the existing feature-value pair. By default, these feature-value pairs are connected by conjunction. To toggle between conjunction and disjunction, click on the corresponding symbol.

1. Feature selection menu

The feature selection menu is primarily used to choose the feature, but can also be used to delete or negate the feature-value pair (visualized by a red line from lower left to upper right).

Figure: Feature selection menu (left) and negated feature-value pair (right).

2. Operator selection menu

Figure: The operator selection menu.

The is operator is selected by default. The following operators are available:

is
The feature must agree with the specified value.

isn't
The feature must not agree with the specified value.

contains
The feature value must contain the given string.

doesn't contain
The feature value must not contain the given string.

begins with
The feature value must begin with the given string.

ends with
The feature value must end with the given string.

is regular expression
The given pattern must match the feature value.

isn't regular expression
The given pattern must not match the feature value.

is equal to
The feature value is equal to the feature value of another node (cf. paragraph below).

The operators contains, doesn't contain, begins with, and ends with do not have a textual equivalent. They are converted to regular expressions (cf. subsection 3.5, chapter III). For example begins with 'abc' will be converted to /abc.*/.

3. Value specification

By selecting the operator is or isn't, the feature value can be specified by one or more values or types. In this case the third item of the feature-value pair is a field, such as the inner node specification area. By clicking at the left or right of the existing primary value/type, further values/types can be created. You can switch between disjunction and conjunction by a mouse click on the connector.

Figure: One or more values/types can be used to specify a feature value.

The first item of a value/type menu is used for deleting.

Please note: If there is only one value/type, the delete function does not work.

The second menu item is for negation, also visualized by a red line from lower left to upper right.
Types are marked by the symbol

and (by convention) type names are written in small letters. All possible feature values are listed, marked by the symbol

. If there is no feature value list available e.g. for the word feature), there is an input field you can type in a value (see figure below).

Figure: Input field (left) and menu for selecting a value or type (right).

Equality

To specify a feature value to be equal to a feature value of another node, variables are used (cf. subsection 7.2, chapter III). To create such a variable you have to select the operator is equal to. The mouse cursor will change into an arrow

. Now click either into a free space of an inner node specification area of another node or on the same feature-value pair of another node (see figure below).

Figure: A click into an inner node specification area will create a new feature-value pair.

Figure: A click on an existing feature-value pair will set its value equal to the source feature value.

If you want to delete a feature-value pair which is used as a reference for an is equal to feature value, the following dialog will appear:

Figure: Deleting a variable specification, which is referenced by another feature-value pair, is not allowed.

4.5 Edges

Creating

Edges represent graph relations: dominance and precedence (cf. section 6, chapter III). An edge can be created between two existing nodes. To create a dominance edge you have to select a lower dominance plug of a node (cf. subsection 4.3) and a second partner node. It is also possible to do this in inverse order by first selecting an upper dominance plug and a second node as the start node of the relation.

Figure: Dominance edges are created by clicking on dominance plugs.

To create a precedence edge you have to first select a right precedence plug or, to do it in inverse order, a left precedence plug and afterwards another node.

Figure: Precedence edges are created by clicking on the precedence plugs.

The type of the created edge (i.e. direct or non-direct, negated or not negated) depends on the preselection, which can be set in the toolbar (cf. subsection 4.6). After creating an edge, the type can be changed in the edge menu.

Multiple selection and creation

Two or more nodes or plugs can be selected by holding down the SHIFT or CTRL key while clicking. This can be used to create more than one edge simultaneously (see figure below).

Figure: By pressing the SHIFT or CTRL key two or more nodes or plugs can be selected.

Dominance

Figure: The menu of a dominance (left) and direct dominance (right) edge.

The first item of the edge menu is for deleting. The following three menu items provide functions to change the type of an edge after creating it.
If it is a non-direct dominance edge, the distance can be specified, either by the exact value or by a range. If the Distance is ... or Distance range from ... to ... item is enabled in the edge menu, one resp. two input fields are shown left of the edge menu. These values can be changed by either typing in a new value or by pressing the increase or decrease button, which are shown at the right of the input field when the mouse cursor is over the input field or the input field is activated. If the edge type is direct dominance, an optional edge label can be chosen from a list.

Figure: Distance specification (left) and edge label (right).

Left/right corner

If the end node of the (direct) dominance relation is a token, three additional edge menu items are available for the optional corner specification (cf. subsection 6.2, chapter III). Left or right corners are visualized with black filled triangles (see figure below).

Figure: The menu items for corner specification.

Horizontal edges

Beside the dominance relation there are two more relations. With the precedence relation you can specify the horizontal order and distance between nodes. Secondary edges are often used as a kind of additional dominance relation. Secondary edges are not supported by all corpora.

Figure: The menu of a precedence, direct precedence, and secondary edge.

The first menu item can be used to delete the edge. Below there are four (three if the current corpus does not support secondary edges) items to choose the type of the edge and to negate it (see figure above). If the edge type is precedence, input fields for either the exact distance value or a distance range can be enabled by choosing either the Distance is ... or Distance range from ... to ... menu item. The values can be changed by either typing in a new value or by pressing the increase or decrease buttons, which are shown at the right of the input field when the mouse cursor is over the input field or the input field is activated.

Please note: Not every corpus supports secondary edges and secondary edge labels.

4.6 Toolbar

The buttons of the toolbar are divided into four parts: tool selection, dominance edge type preselection, horizontal edge type preselection, and switch to textual mode.

Figure: The toolbar.

Tool selection: Moving/Creating, Creating, Moving

With the leftmost three buttons you can specify how to create objects and how to move nodes:

Move/Create
Objects like nodes, feature-value pairs, and feature values will only be created by a mouse click if the selection is empty. If the selection is not empty, a mouse click in a free area deselects every object.

Create
Regardless whether the selection is empty or not, nodes, feature-value pairs, and feature values can be created by a mouse click at the appropriate location. For deselection click on a selected object or use the context menu (cf. subsection 4.7).

Move
Nodes, feature-value pairs, feature values, and edges cannot be created if this tool is activated. In this mode you can only move nodes.

Dominance edge type preselection

Before creating a dominance edge the type of the new edge can be preselected (cf. also subsection 4.5). Of course the type of an edge can be changed after creating, but preselection is very useful for creating two or more edges simultaneously. There are four different dominance edge types:

Dominance

Negated Dominance

Direct Dominance

Negated Direct Dominance

Horizontal edge type preselection

The type of a new horizontal edge can also be preselected. (cf. subsection 4.5).

Precedence

Negated Precedence

Direct Precedence

Negated Direct Precedence

Secondary Edge

Negated Secondary Edge

Please note: Secondary edge and negated secondary edge are only shown if the current corpus supports secondary edges.

Switch to Textual Mode

By clicking the rightmost button marked by the symbol

the textual representation of the current graphical query will be copied into the text query editor of TIGERSearch (cf. section 3). The query text can now be edited and submitted.

Please note: In the current version of TIGERSearch a textual query cannot be converted to a graphical query (cf. subsection 4.1).

4.7 Context Menu

Popping up the context menu depends on the platform. Either the second or third mouse button has to be pressed or released. The context menu and its submenus are shown below:

Figure: The context menu and its submenus.

Delete

Deletes all selected objects. Edges cannot be selected but will be deleted if the start or end node is deleted. Nodes are also deleted if only a node plug is selected.

Select All

If the context menu is called from the inner node specification area, all feature constraints of the current node are selected.

Deselect

Removes the current selection.

View - Stepped Lines

The dominance relation can be visualized either by stepped lines (cf. TIGERGraphViewer) or by a straight line edge:

Figure: Stepped lines (left), straight lines (right).

View - Node Plug Borders

To point out the different areas of a node (cf. subsection 4.3) small lines show the borderlines. These lines can be disabled.

Figure: Nodes with (left) or without (right) node plug borders.

Bookmarks

The textual representation of the graphical query can be saved for later use as a bookmark (cf. subsection 3.4). Selecting Add Bookmark To Main Group adds the textual representation of the current graphical query into the main group of the bookmark folder, Add Bookmark To Current Group will add a bookmark into the currently selected bookmark folder.

Please note: In the current version, the graphical query representation cannot be saved for later usage.

Export - As Image

The current graphical query can be exported as an image. There are five export formats available:

SVG (Scalable Vector Graphic)
An XML-based vector graphics format. This means that images can be scaled without loss of quality.

TIF (Tag Image File Format)

PNG (Portable Network Graphics)

JPG (JPEG File Interchange Format)

PDF (Portable Document Format)

Figure: Export As Image dialog.

Export - Textual Query to Clipboard

The textual representation of the current graphical query is copied to the clipboard to be used in other applications.

Switch to Textual Mode

The textual representation of the current graphical query will be copied to the text query editor of TIGERSearch (cf. section 3). The query text can now be modified and submitted.

Please note: In the current version of TIGERSearch a textual query cannot be converted to a graphical query (cf. subsection 4.1).

5. Processing a query

5.1 Basic query processing

Query processing is started by pressing the Search button in the lower right corner of the TIGERSearch window, or pressing the Search button in the button toolbar, or selecting the Search item in the Query menu of the TIGERSearch window.

Figure: Starting query processing

The query processing progress window will keep you informed about the current status of query processing (cf. screenshot). Pressing the Cancel button will stop query processing. The matching corpus graphs found up to this point will be displayed by the GraphViewer (cf. section 7).

Figure: The query processing progress window

A common error that might occur during query processing is a syntax error detected by the query parser. In this case, a message window will be displayed that informs you about the error type (cf. screenshot). After pressing the OK button, the cursor will automatically jump to the error position.

Figure: Handling of syntax errors

When query processing has finished, the matching corpus graphs will be displayed by the GraphViewer (cf. screenshot and section 7 for a detailed description). You can also view the matches in the statistics mode (cf. section 8) or export your favourite matches using the export module (cf. section 9).

Figure: Query results

5.2 Advanced query processing

If you do not want to search the whole corpus, but just on a part of it, you can specify a corpus area. This feature is very useful for experimenting with queries. To specify a corpus area, open the Query Options window by clicking the corresponding toolbar icon or selecting the Search Options item in the Query menu of the TIGERSearch tool:

Figure: Query processing options

Now select one of the preselected search spaces or type in the search space using the syntax start-end. If you press the OK button, the defined search space will be saved for later processing. If you like to save the search space and immediately start query processing using the defined option, just click the Search button in the Query Options window.

Please note: After leaving the Query Options window, query processing using the search area option is only activated when clicking the Search with options toolbar icon or selecting the Search with options item in the Query menu. If you start query processing using the Search button, the defined search option will not be used.

If you do not want to view all matching corpus graphs, you can set an upper bound for the number of matching corpus graphs determined by the query processor. If you specify to search for n matching corpus graphs, query processing will be stopped automatically when n matching corpus graphs have been found. This option is also specified in the Query Options window.

6. Use of templates

The syntax of template definitions and of queries which involve template calls is described in section 9, chapter III. In the current section, you learn how templates are loaded into the TIGERSearch tool, and how to handle error messages concerning templates.

6.1 Template path and template files

Currently, template definitions have to be 'hard-wired' individually for each corpus. This means that only 'power users' can change the template definitions for a corpus. Or, to put it the other way round: If you want to develop your own template definitions, you need your personal copy of a corpus. Please refer to subsection 4.5, chapter VI to see how a corpus template path, i.e. the directory which holds the template definitions, is associated with a corpus.

Template definitions have to be stored in template files, i.e. files which have the extension .tig (tiger template). The corresponding, automatically created binary code will be stored in files with extension .tgc (tiger template, compiled).

Please note: File access for templates must be set in the following manner:

Users of a corpus which comes with template definitions, need read access not only to the corpus itself but also to the whole directory hierarchy below the template path.

Template developers need read and write access to the whole template directory hierarchy.

Please note: In certain situations,corpus users may get write access error messages wrt. the template directories. In this case, a person with write access to the template directory hierarchy should reload the corpus which produced the error messages in order to have the binary template files created newly.

6.2 Template loading

Template loading is carried out automatically when a corpus is (re-)loaded (cf. subsection 2.1). For template developers, this means that you have to reload the corpus in order to make changes of the template definitions known to the TIGERSearch tool.

The template definitions which are associated with a corpus are loaded from the given template path in the following order:

depth-first

Files from subdirectories will be loaded before the .tig files in the given directory.

alphabetical order

Subdirectories resp. files are loaded in alphabetical order.

Please note: Based on the above order, only the first template definition for a template head (template name/arity) will be loaded. All other definitions for the same template head will be ignored.

For faster processing, templates are translated ('compiled') into Java objects. In order to avoid unnecessary recompilation steps, compiled template code is stored on the hard disk in binary form (files with extension .tgc). The compilation of a template source file filename.tig is triggered automatically during the template loading process, in the following situations:

The compiled file filename.tgc does not exist yet.

The compiled file filename.tgc is older than source file filename.tig.

The TIGERSearch system has been updated in the meantime, i.e. the internal data structures of the compiled files are outdated.

The fact that compiled code is written on hard-disk means that file access properties have to be set in an appropriate manner (cf. subsection 6.1).

6.3 Error messages during template loading

If there are any template loading problems during the corpus loading process, all error messages are stored for inspection. These messages can be inspected by clicking on the warning symbol in the corpus information tab (cf. also subsection 2.2):

Figure: Inspection of template loading errors

The handling of file access errors has already been explained in subsection 6.1.

If a template call does not match the head of any template definition with respect to template name and number of parameters, an undefined template error message is caused if the template is called. Therefore, you should

verify the template name and the number of argument parameters, both in the template call and in the corresponding template definition. Correct the template call according to your template definitions, or vice versa.

check whether the corresponding template definition has been compiled properly. If the compilation process has failed for a template definition, e.g. due to syntax errors or read/write access errors, obviously, the definition is not available in the internal template store.

Variable-related error messages, e.g. 'undefined type of variable' or 'variable type clash' will be produced in the following situations:

The type of an argument parameter in a template call cannot be determined from the context of the template call.

The type of an argument parameter of a template head cannot be determined from the template body.

An argument parameter of a template call and the corresponding parameter in a template head have disjoint types.

For example, the type of variable #n5 cannot be derived from the other constraints in the body of the following template definition:

VerbPhrase(#n0) <-
   #n0:[cat="VP"] > #n1:[pos="VVFIN"]
   & #n0 > #n2
   & #n1.#n2
   & arity(#n0,2)
   & PrepPhrase(#n5) ;

This problem is solved by either correcting the name of the variable, or by inserting the same variable name somewhere else in the graph description, so that the type of the variable becomes clear (node variable vs. feature constraint variable vs. feature value variable).

A cyclic template definition error will be issued if a template definition embeds a call to itself, either directly or indirectly. The error message lists the 'ancestors' of the cyclic call in order to give you a hint how the cycle could have been created. Currently, the ancestors are listed in arbitrary order - which may be somewhat confusing.

6.4 Restrictions of the current implementation

Although a file may contain several defining clauses for the same or for different templates, only the first clause for each template will be retained.

Workaround: Use the disjunction operator | in order to pack alternative template bodies into a single defining clause.

7. Viewing the matches - The GraphViewer

7.1 Navigation through the corpus graphs

The GraphViewer is used to browse through the whole corpus (corpus exploration) or to browse through the matching corpus graphs (match visualization). The corpus exploration mode is activated by pressing the Explore corpus button or selecting the Explore item in the Corpus menu. The match visualization is automatically activated after query processing. When opened, the GraphViewer will display the first corpus graph or the first matching corpus graph, respectively.

The tokens of the currently displayed corpus graph is presented at the bottom of the window. You can navigate through the matching corpus graphs by using the navigation panel at the bottom of the GraphViewer. You can either view them one by one in the order they have been found using the Previous and Next buttons, or browse through the corpus graphs using the slider above these buttons.

Figure: Browsing through the (matching) corpus graphs

You can move right to the last graph by pressing the Last button and back to the first one pressing the First button. You may also move to a graph with a certain position within the results. Just type in the position number in the input field between the First and Last button and press Return.

If there are two or more matching subgraphs in the same corpus graph, the green navigation arrows in the Subgraph box on the right hand side of the navigation bar will be activated. Now you can navigate from one matching subgraph to the next using the green buttons. In the example screenshot, there are four NP subgraphs within the current corpus graph matching the query [cat="NP"].

The information box on the left shows the total number of matching corpus graphs and the total number of matching subgraphs within all corpus graphs. Thus, we have 176 corpus graphs which comprise at least one NP and 448 NPs in the whole corpus.

7.2 Advanced navigation

Focus on match

If you do not want to see the whole graph, but only the matching subgraph, you can turn on the Focus on match button in the toolbar (represented as a flash light turned on/off). The GraphViewer will display just the matching subgraph. Turning off the Focus on match button will display the initial view of the graph.

Please note: The Focus on match feature is not accessible in the corpus exploration mode.

Figure: Focussing the matching structure

Sentence text field

If you do not want to see the tokens of the graph you can turn off the Token button in the toolbar (represented by a T character turned on/off). Turning on the Token button will display the tokens again. To change the context size of the sentence text field (current sentence +- 0/1/2 context sentence(s)) you can use the display options menu.

Imploding subgraphs

If you want to hide parts of the graph you can implode subgraphs pressing the middle mouse button on the node which is the root of the subgraph you want to hide (cf. screenshot). If you are using a two-button mouse, clicking the left and right button simultaneously should emulate the middle button click. Clicking the imploded node again will expand it to original size. Of course, node imploding does not make sense for the root node and terminal nodes.

Figure: Imploding a subgraph

Node tooltip

If the mouse pointer is placed over a node (token or inner node), a node tooltip windows pops up. This windows comprises the annotation of all features:

Figure: Node tooltip

Refresh

If the graph is not displayed properly, use the Refresh button in the toolbar to repaint the graph.

7.3 Communicating with the Statistical Viewer

If you are simultaneously working with the GraphViewer and the Statistical Viewer, you can easily switch between these two views using the shortcut icons in the lower left corner of both windows.

However, the two windows have also been designed to communicate with each other. The corpus graph currently displayed in the GraphViewer will be displayed and marked in the Statistical Viewer if you select the Display in Statistics item of the GraphViewer's context menu (activated by pressing the right mouse button in the GraphViewer panel).

7.4 Image export

Printing the current graph / Postscript export

You can print the current graph pressing the Print button. The print options can be selected in a platform-dependent print dialog. Some general options such as paper format can be specified in the page setup window which is activated by selecting the Pape Setup item in the Graph menu of the GraphViewer.

Please note: If you choose a Postscript printer in the print dialog (e.g. Apple Laserwriter), you can use the Print into file option to create a Postscript version of the displayed corpus graph. The created Postscript file may be used by applications such as LaTeX. Please note that the quality of the Postscript output depends on the selected printer and its parameters.

Exporting the current graph (image export)

You can save the current graph to several image formats (SVG, JPG, PNG, TIF, and PDF) using either the Export as image item of the context menu (activated by pressing the right mouse button), or the Export image icon (cf. picture icon) of the toolbar, or the Export image item in the Graph menu. A dialog window pops up, and you can select the options you prefer for saving the current graph (cf. screenshot).

Figure: Exporting the current corpus graph as an image

The export is based on the SVG format which is an XML-based vector graphics format. In contrast to binary formats such as JPG format, images encoded using SVG can be manipulated (e.g. scaled) without loss of quality. Thus, you can change a token or a syntactic category within the exported SVG file. If you selected an non-SVG image, first an SVG representation is generated and afterwards converted into the preferred format.

The SVG output has been designed in such a way that the different node types (inner/outer node, matching node, imploded node etc.) can be identified by additional XML attributes. Thus, this information can be used to modify the generated SVG image, e.g. by Cascading Stylesheets (CSS).

The image filters you can choose (cf. screenshot above) are realized by inserting predefined CSS stylesheets into the SVG document. In subsection 10.1 we explain how to add your own image filter to the SVG export.

Please note: Relative paths used to specify the export file are evaluated with regard to the working directory.

Exporting the match forest (animated SVG image)

If you like other users to have a look at your favourite matches, you may want to export the matches in a format that does not depend on the TIGERSearch software suite. Of course, you might export all the matching corpus graphs as single images, but this solution would not be practical.

As an alternative, you can use the Export match forest option to export all matching corpus graphs to one single SVG file. The individual graphs are combined by SVG animations. Thus, you can use any SVG viewer to navigate through the match forest. Please note that the SVG viewer has to support SVG animations.

You can open the Match forest export window by pressing the Export match forest button (cf. forest icon) in the button toolbar or selecting the Export forest as SVG item in the Graph menu. Now you can specify the following parameters:

Figure: Exporting a match forest (animated SVG image)

Parameter: SVG file name

You can either generate an uncompressed SVG file (*.svg) or a compressed SVG file (*.svgz). Relative paths are evaluated with regard to the working directory. Compression reduces the size of the file to about 10% of its original size.

Parameter: match selection

You can restrict the export of all matching corpus graphs to a range of matching graphs, or specify them in a text field (e.g. 1-3;6-7). Note that you have to specify the number (or position) within the match forest, not the corpus graph ID.

Since a corpus graph can match a query more than once, you might prefer to export a graph as often as it matches the query. In this case please check the box Include all matches within a corpus graph.

Parameter: image includes

All corpus graphs are exported in a canonical form, i.e. as fully expanded graphs including match highlighting and match focussing. You can turn on/off the match highlighting, include/exclude the exact description of the match in the SVG navigation bar, and specify the background color. We recommend you to use transparent; in this case the SVG viewer is responsible for the selection of the background color.

To start the export process just press the Submit button. Please note that the export process cannot be stopped.

A popular SVG viewer is the SVG browser plugin which has been developed by Adobe (cf. http://www.adobe.com/svg/viewer/install/). This plugin is currently available for the Microsoft Windows and Apple Macintosh platforms. Using this plugin, you can view an SVG image in the Netscape and Internet Explorer browsers.

If you are interested in integrating SVG images in Microsoft PowerPoint presentations, you should visit the following web page: http://www.indezine.com/products/powerpoint/. This page comprises a detailed description of the integration process. It also contains interesting general hints to improve your PowerPoint presentations.

7.5 Viewer options

The default configuration of the TIGERGraphViewer has been defined in such a way that the visualization should work fine on all supported platforms. If you like to adapt the configuration according to your individual preferences, you can modify the display colors and the most important display parameters. Your modifications are saved when leaving the tool and restored at the beginning of your next TIGERSearch session.

Color options

To change the color settings of the GraphViewer, please select the Color options item in the Options menu. The display color windows appers:

Figure: Changing the color settings of the GraphViewer

In this window all colors used by the TIGERGraphViewer are listed and illustrated by the color of the respective choose button. To change a color, press the corresponding Choose button. Another window pops up that lets you select the new color. Select the preferred color and leave this window by pressing the OK button.

If you would like to restore your configuration (that has been valid before opening the color options window), just click the Reset button. To restore the default values that have been pre-installed for using the TIGERGraphViewer activate the Default button. To leave the window without changing the configuration press the Cancel button. To submit your changes press the OK button.

Display options

Some important display parameters used by the TIGERGraphViewer can also be changed. To open the display options window, please select the Display options item in the Options menu. The display options window appers.

Figure: Changing display parameters of the GraphViewer

The following general parameters can be modified:

Maximum width of terminal nodes

The maximum width of the terminal node visualization is limited. Thus, some feature values (especially word forms) may be truncated. If this occurs too often, just specify a higher width.

Display secondary edges

In order to turn on/off the display of secondary edges check/uncheck this box.

Display virtual root node

Some corpus formats such as the Negra format do not integrate special types of tokens (e.g. punctuations marks) into the graph annotation. Unfortunately, the TIGERSearch query processing algorithm only works on connected graphs. Hence, during the indexing process a virtual root node is inserted in any graph that comprises terminal nodes (e.g. punctuations marks) which are not integrated into the corresponding syntax graph. In order to turn on/off displaying of virtual root nodes check/uncheck this box.

Hide...

Sometimes annotating a token feature does not make sense for features such as morphological feature or case. In this case, a qualified symbol such as -- is used instead. We recommend you to use -- which is also used in our implemented import filters.

Since these symbols do not represent annotation information, it makes sense to hide them in the visualization. The two boxes must be checked if you want to hide a special feature value or edge label. Please specify the symbol which should be suppressed in the respective text fields.

The following corpus-depending parameters can be modified:

Displayed non-terminal feature

For non-terminal nodes, only one feature can be displayed in the GraphViewer. If there are two or more features defined for non-terminal nodes, the displayed feature can be changed in this dialog. By default, the non-terminal feature defined first in the TIGER-XML encoding is selected.

Displayed terminal features

By default, all features of the terminal nodes are displayed. If you like to reduce the number of displayed features, just deselect the features you are not ineterested in.

Please note: As these parameters are corpus-depending, they will be restored when the corpus is used again (even within the next TIGERSearch session).

If you like to restore your configuration (that has been valid before opening the display options window), just click the Reset button. To restore the original system defaults that have been pre-installed for using the TIGERGraphViewer activate the Default button. To leave the window without changing the configuration press the Cancel button. To submit your changes press the OK button.

8. Viewing the matches - Statistics

8.1 Introduction

The statistical viewer has been developed as a specialized view on the match results. Users can export their favourite matches as tables. We support a text-based output format, an proprietary XML-based format, and the Excel format.

In the present section, the description of the statistical viewer is illustrated by the results of the following TIGERSearch query:

#np:[cat="NP"] &
#art:[pos="ART"] &
#adj:[pos="ADJA"] &
#nn:[pos="NN"] &
#np > #art &
#np > #adj &
#np > #nn &
arity(#np,3)

Please note: To identify nodes within the statistical viewer in a unique way, every node must be labelled, i.e. it must be identified by a node variable (e.g. #art).

To show the statistical viewer, press the Statistics button in the button toolbar of the TIGERSearch main window, or select the Statistics item in the Query menu.

Please note: You can easily switch between the TIGERSearch main window, the GraphViewer window, and the statistical viewer using the shortcut buttons in the lower left corner of the three windows. If you press a shortcut button, the corresponding window will be moved in front of all other windows on your desktop.

8.2 Specifying nodes and features

First of all, you have to specify the nodes and node features you are interested in. These are selected in the first two rows of the statistics table. Select the node in the first row and the node feature in the second row (cf. screenshot). If you like to display more than just one node feature, you can add more node feature columns by clicking the Add button or selecting the Add column item in the context menu of the feature columns. The Remove and Clear buttons (and its corresponding items in the context menus) can be used to delete a single column or to delete all currently used columns, respectively.

Figure: Specifying nodes and features

You might also choose the default arrangement of the nodes and features by pressing the Default button in the button toolbar. All terminal nodes specified in the query will be presented as columns and the default feature (usually the word feature) will be used.

Next, you have to build the table. Press one of the two Build buttons in the upper left or lower right corner, respectively. The statistics table will be filled with feature value information.

Figure: Building the table

The left two columns of the table show the corpus graph ID and the number of the current submatch (Remember that a query can be matched by a corpus graph more than once.). By default, the rows are ordered corresponding to the ordering of the corpus graphs. If you like to change the sort sequence of the rows, consult subsection 8.4.

To adapt the table layout, you can also change the width of a column or change the column ordering with the help of your mouse device.

8.3 Corpus view and Frequency view

The default view of the statistical viewer is the Corpus view, i.e. the rows are ordered with respect to the corpus graph ordering. However, to analyze the data it is sometimes helpful to group indentical rows and display their frequency. To switch to the Frequency view click on the Frequency button.

Figure: The frequency view

Now the left column shows the number of occurences of the rows displayed in the table. The rows are ordered by frequency.

8.4 Changing the sort sequence

The ordering of the table rows in both Corpus View and Frequency View can be changed easily. Just double-click on the upper border of the column you like to select as the sort sequence column. Or select the Sort by column item in the context menu of the column headline (activated by a right button mouse click on the column headline). In the following example, the second column has been selected. If you double-click the column again, the rows are sorted in reverse order.

Figure: Changing the sort sequence

If you select the Sort by column item in the context menu of the GraphID or Submatch column within the Frequency View, the default ordering of the table will be restored.

8.5 Communicating with the GraphViewer

If you are simultaneously working with the GraphViewer and the Statistical Viewer, you can easily switch between these two views using the shortcut icons in the lower left corner of both windows.

However, the two windows have also been designed to communicate with each other. The match currently displayed in the stattistical viewer will be displayed and highlighted in the GraphViewer by double-clicking the corresponding row (mouse device must be on the graph ID or submatch column) or by selecting the Open match in GraphViewer item in the context menu of the row (activated by a right mouse button click on the graph ID or submatch column).

8.6 Exporting the results

To export the statistics, first mark the rows you want to export. Use the mouse to mark the rows or select the Mark all item in the context menu. To unselect the marked rows, just select the Clear selection item in the context menu. Click the export button to display the export dialogue window.

Figure: Exporting the results

We have implemented three export formats:

Text format

Columns are separated by tabs, rows are separated by carriage return.

XML format

The data is exported in an XML-based format that can be used for further processing.

Excel format

The data is exported as an Microsoft Excel table.

As an alternative, you can also copy the text format output into the clipboard. Just select the Copy button in the button toolbar.

9. Exporting the matches

9.1 Introduction

For the export of your favourite matches we have implemented two different approaches. First, you can export single graphs as image data or a set of matching graphs as an animated SVG image (cf. subsection 7.4). Second, TIGERSearch enables you to export matches to XML using its own TIGER-XML encoding format (cf. chapter V), or to pipe the TIGER-XML output through an XSLT stylesheet. This section describes how to export a TIGER-XML file (cf. subsection 9.3) and how to pipe an XSLT stylesheet through TIGER-XML output (cf. subsection 9.4).

Please keep in mind that an exported TIGER-XML corpus can be indexed by the TIGERRegistry tool, i.e. exported matches can be reused as a new TIGERSearch corpus!

9.2 Setting up the export mode

After processing a query and viewing its results with the GraphViewer you may want to save your favourite matches for later review or processing. Just choose the Export Matches icon in the TIGERSearch main window toolbar or the Export matches item in the Query menu of the main window. The export feature can also be used if you did not yet submit a query. In this case, you can export the whole corpus.

Next, you have to select the output format: TIGER-XML format (cf. subsection 9.3) or XML piped through an XSLT stylesheet (cf. subsection 9.4).

Figure: Setting up the export mode

Next, you have to specify an output file name. You can either type it in by hand (relative paths are evaluated with regard to the working directory) or use a file dialog by clicking on the Search button.

To restrict the export there are several options (cf. screenshot above):

All matching corpus graphs

no restriction, export all matching corpus graphs

Current matching corpus graph

export the corpus graph currently displayed in the GraphViewer

From matching corpus graph

restrict export to a range of corpus graphs

Select matching corpus graphs

restrict export to a list of matching corpus graphs separated by comma or colon (e.g. 1;3 or 1-2;3-7,19)

All non-matching corpus graphs

export all corpus graphs which do not match the corpus query

Whole corpus

export the whole corpus

Please note: Matching corpus graph in this context means the number (or position) of the graph in the forest of matching corpus graphs. Example: 1;5-9 will export the 1st, 5th, 6th, 7th, 8th, and 9th matching corpus graph, but not the corpus graphs with the IDs 1, 5 etc.

Pressing the Submit button will start the export process. It can be stopped at any time.

9.3 Exporting to TIGER-XML

If you choose TIGER-XML as the export format, the following options can be specified in the export window.

Schema reference

The structure of a TIGER-XML export file follows the TIGER-XML schema declaration (cf. section 4, chapter V). In the export file you can refer to the schema file:

on your local computer or network created during the TIGERSearch installation (refer to local schema),

on the TIGERSearch web site (refer to WWW schema),

or make no reference (don't refer to schema).

Figure: Schema reference options

Include/exclude options

You can also exclude certain parts of the export file by unchecking some of the boxes:

Figure: Include/exclude options

Export header: contains meta information and feature declaration; essential if the exported file should represent a new TIGERSearch corpus; unchecked by default

Export graph structure: tokens, inner nodes, edges etc.

Export match info: indicates which part of a corpus graph actually matches the query

Please note: In subsection 2.5, chapter V we describe the encoding of corpus query matches in the TIGER-XML format.

9.4 Exporting with XSLT

Of course, users can first export a TIGER-XML file and afterwards process the XML file with an external stylesheet. However, TIGERSearch offers the feature to do it all in one step. Just choose XML piped through XSLT as your export format and choose one of the predefined stylesheets:

Figure: Stylesheet selection

TIGERSearch is delivered with several predefined stylesheets. If you have created additional stylesheets you can link your stylesheets into TIGERSearch. The linking mechanism is explained in subsection 10.2.

Some interesting predefined stylesheets are now illustrated by a match of the corpus query [cat="NP"]:

sentence format (all tokens): tokens separated by blanks, sentences separated by line breaks

Minister heizt Debatte über Sterbehilfe an

sentence format (tokens+pos): same as above, but each token is annotated with a part-of-speech tag in the token/pos format

Minister/NN heizt/VVFIN Debatte/NN über/APPR Sterbehilfe/NN an/PTKVZ

bracketing format: UPenn-style bracketing format

( (S (NN-SB Minister)
     (VVFIN-HD heizt)
     (NP-OA (NN-NK Debatte)
            (PP-MNR (APPR-AC über)
                    (NN-NK Sterbehilfe)))
     (PTKVZ-SVP an)) )

Please note: If a corpus graph comprises crossing edges, the tokens of the graph may get disordered. An adequate linguistic representation using traces has not been implemented in this stylesheet.

context-free rules: lists all rules used in the corpus annotation (dublicates are not removed)

PP -> APPR NN
NP -> NN PP
S -> NN VVFIN NP PTKVZ

corpus graph numbers: list of corpus graph numbers separated by blanks which can be imported by the Annotate tool (cf. http://www.coli.uni-sb.de/sfb378/negra-corpus/annotate.html)

26743 26745 26746 26747 26748 26750 ...

10. Adapting the TIGERSearch tool

10.1 Image filters

The Scalable Vector Graphics format SVG is an XML-based format. Thus, Cascading Stylesheets (CSS) or Extensible Stylesheets (XSLT) can be used to modify an image. The embedding of stylesheets in SVG documents is explained in the SVG specifications located at the W3C web site ( http://www.w3c.org/Graphics/SVG). The following example comprises an SVG file exported by the TIGERGraphViewer. It illustrates how an Cascading Stylesheet is used to set the background color of an image to white:

<svg width="662" height="333">

  <style type="text/css">
    g[type="bgcolor"] > rect { fill:white }
  </style>

  <g type="sentence" id="s10">

    <!-- create background color -->
    <g type="bgcolor">
      <rect x="0" y="0" width="662" height="333" fill="grey"/>
    </g>

    <!-- terminal node "s10_1":[word="There" & pos="EX"] -->
    <g type="t" id="s10_1" match="subgraph" font-family="dialog" font-style="normal" font-weight="normal" font-size="14" fill="rgb(150,0,0)">
      <text x="55" y="276" text-anchor="middle">There</text>
      <text x="55" y="297" text-anchor="middle">EX</text>
    </g>
    ...

  </g>

</svg>

To develop your own Cascading Stylesheets, we recommend you to export an SVG example corpus graph which makes use of an image filter. The generated SVG file can be used to experiment with your stylesheet. The SVG specification supports Cascading Stylesheets Level 2 which are described in a W3C specification (cf. http://www.w3c.org/Style/CSS).

The integration of your own CSS stylesheets is quite simple. All stylesheets that are represented by image filters in the graphical user interface are listed in a file named tigersearch_svg.xml which is located in the config/ subdirectory of your TIGERSearch installation directory. The file looks like the following:

<svgfilter version="1.0">

  <filter name="black/white">
     text { fill:black }
     line,rect,ellipse { stroke:black }
     rect { fill:white }
     path { stroke:black }
  </filter>

  <filter name="white background">
    g[type="bgcolor"] > rect { fill:white }
  </filter>

</svgfilter>

To add your stylesheets, just insert a new <filter> element. Specify the stylesheet name by setting the value of the attribute name. Insert the stylesheet text as the element content of the new <filter> element.

Be aware of the XML element content conventions, e.g. use < instead of < in the content of a <filter> element. Please also note that TIGERSearch has to be restarted in order to activate the new stylesheet configuration.

10.2 XSLT stylesheets

All XSLT stylesheets used to transform the export of query matches are placed in the export/ subdirectory of your TIGERSearch installation directory. If you want to develop a new stylesheet, we recommend you to use the file testsentence.xml for testing your stylesheet. As the matching corpus graphs are processed on your stylesheet separately, this test file illustrates the XML representation of one single graph. It is located in the export/ directory, too.

The XSLT stylesheets offered by the user interface are registered in a file named TIGERExportRegistry.xml which is also located in the export/ subdirectory. This file looks like the following:

<registry version="1.0">

  <stylesheet name="bracketing format" sentence_xslt="bracket.xsl"/>

  <stylesheet name="sentence format (tokens)" sentence_xslt="sentence.xsl"/>

  <stylesheet name="sentence format (tokens + pos)" sentence_xslt="pos.xsl"/>

</registry>

To add your stylesheet, just insert a new <stylesheet> element. Specify the stylesheet name by setting the value of the attribute name. Specify the filename by setting the value of the sentence_xslt attribute.

Please note: TIGERSearch has to be restarted in order to activate the new stylesheet configuration.