9. Exporting the matches

9.1 Introduction

For the export of your favourite matches we have implemented two different approaches. First, you can export single graphs as image data or a set of matching graphs as an animated SVG image (cf. subsection 7.4). Second, TIGERSearch enables you to export matches to XML using its own TIGER-XML encoding format (cf. chapter V), or to pipe the TIGER-XML output through an XSLT stylesheet. This section describes how to export a TIGER-XML file (cf. subsection 9.3) and how to pipe an XSLT stylesheet through TIGER-XML output (cf. subsection 9.4).

Please keep in mind that an exported TIGER-XML corpus can be indexed by the TIGERRegistry tool, i.e. exported matches can be reused as a new TIGERSearch corpus!

9.2 Setting up the export mode

After processing a query and viewing its results with the GraphViewer you may want to save your favourite matches for later review or processing. Just choose the Export Matches icon in the TIGERSearch main window toolbar or the Export matches item in the Query menu of the main window. The export feature can also be used if you did not yet submit a query. In this case, you can export the whole corpus.

Next, you have to select the output format: TIGER-XML format (cf. subsection 9.3) or XML piped through an XSLT stylesheet (cf. subsection 9.4).

Please click to enlarge!

Figure: Setting up the export mode

Next, you have to specify an output file name. You can either type it in by hand (relative paths are evaluated with regard to the working directory) or use a file dialog by clicking on the Search button.

To restrict the export there are several options (cf. screenshot above):

All matching corpus graphs

no restriction, export all matching corpus graphs

Current matching corpus graph

export the corpus graph currently displayed in the GraphViewer

From matching corpus graph

restrict export to a range of corpus graphs

Select matching corpus graphs

restrict export to a list of matching corpus graphs separated by comma or colon (e.g. 1;3 or 1-2;3-7,19)

All non-matching corpus graphs

export all corpus graphs which do not match the corpus query

Whole corpus

export the whole corpus

Please note: Matching corpus graph in this context means the number (or position) of the graph in the forest of matching corpus graphs. Example: 1;5-9 will export the 1st, 5th, 6th, 7th, 8th, and 9th matching corpus graph, but not the corpus graphs with the IDs 1, 5 etc.

Pressing the Submit button will start the export process. It can be stopped at any time.

9.3 Exporting to TIGER-XML

If you choose TIGER-XML as the export format, the following options can be specified in the export window.

Schema reference

The structure of a TIGER-XML export file follows the TIGER-XML schema declaration (cf. section 4, chapter V). In the export file you can refer to the schema file:

on your local computer or network created during the TIGERSearch installation (refer to local schema),

on the TIGERSearch web site (refer to WWW schema),

or make no reference (don't refer to schema).

Please click to enlarge!

Figure: Schema reference options

Include/exclude options

You can also exclude certain parts of the export file by unchecking some of the boxes:

Please click to enlarge!

Figure: Include/exclude options

Export header: contains meta information and feature declaration; essential if the exported file should represent a new TIGERSearch corpus; unchecked by default

Export graph structure: tokens, inner nodes, edges etc.

Export match info: indicates which part of a corpus graph actually matches the query

Please note: In subsection 2.5, chapter V we describe the encoding of corpus query matches in the TIGER-XML format.

9.4 Exporting with XSLT

Of course, users can first export a TIGER-XML file and afterwards process the XML file with an external stylesheet. However, TIGERSearch offers the feature to do it all in one step. Just choose XML piped through XSLT as your export format and choose one of the predefined stylesheets:

Please click to enlarge!

Figure: Stylesheet selection

TIGERSearch is delivered with several predefined stylesheets. If you have created additional stylesheets you can link your stylesheets into TIGERSearch. The linking mechanism is explained in subsection 10.2.

Some interesting predefined stylesheets are now illustrated by a match of the corpus query [cat="NP"]:

sentence format (all tokens): tokens separated by blanks, sentences separated by line breaks

Minister heizt Debatte über Sterbehilfe an

sentence format (tokens+pos): same as above, but each token is annotated with a part-of-speech tag in the token/pos format

Minister/NN heizt/VVFIN Debatte/NN über/APPR Sterbehilfe/NN an/PTKVZ

bracketing format: UPenn-style bracketing format

( (S (NN-SB Minister)
     (VVFIN-HD heizt)
     (NP-OA (NN-NK Debatte)
            (PP-MNR (APPR-AC über)
                    (NN-NK Sterbehilfe)))
     (PTKVZ-SVP an)) )

Please note: If a corpus graph comprises crossing edges, the tokens of the graph may get disordered. An adequate linguistic representation using traces has not been implemented in this stylesheet.

context-free rules: lists all rules used in the corpus annotation (dublicates are not removed)

PP -> APPR NN
NP -> NN PP
S -> NN VVFIN NP PTKVZ

corpus graph numbers: list of corpus graph numbers separated by blanks which can be imported by the Annotate tool (cf. http://www.coli.uni-sb.de/sfb378/negra-corpus/annotate.html)

26743 26745 26746 26747 26748 26750 ...