2.4 Subcorpora

As a corpus grows, it sometimes needs to be divided into several files. Therefore the concept of subcorpora has been introduced in the TIGER-XML format. In the main corpus a link is placed to a subcorpus. The subcorpus consists of corpus graphs or other embedded subcorpora. It can be validated using the subcorpus subschema of the TIGER-XML format (cf. section 4).

The embedding syntax is the following: Within the corpus body, an element <subcorpus> is placed. Its attributes name and external specify the name of the subcorpus and its URL, respectively.

Please note: As the link is represented as an URL, a protocol has to be specified. If the subcorpus is placed within the local file system, use the file: protocol. A relative path will be evaluated with regard to the path of the embedding XML file.

The following example illustrates the embedding:

Main corpus (main.xml)

<corpus>

  <head>
    ...
  </head>

  <body>
    <subcorpus name="embedded corpus" external="file:subcorpus.xml"/>
  </body>

</corpus>

Subcorpus (subcorpus.xml)

<subcorpus name="embedded corpus">

  <s id="s1">
  ...
  </s>

  ...
  
</subcorpus>