As a corpus grows, it sometimes needs to be divided into several files. Therefore the concept of subcorpora has been introduced in the TIGER-XML format. In the main corpus a link is placed to a subcorpus. The subcorpus consists of corpus graphs or other embedded subcorpora. It can be validated using the subcorpus subschema of the TIGER-XML format (cf. section 4).
The embedding syntax is the following: Within the corpus body, an element <subcorpus> is placed. Its attributes name and external specify the name of the subcorpus and its URL, respectively.
Please note: As the link is represented as an URL, a protocol has to be specified. If the subcorpus is placed within the local file system, use the file: protocol. A relative path will be evaluated with regard to the path of the embedding XML file.
The following example illustrates the embedding:
Main corpus (main.xml)
<corpus> <head> ... </head> <body> <subcorpus name="embedded corpus" external="file:subcorpus.xml"/> </body> </corpus>
<subcorpus name="embedded corpus"> <s id="s1"> ... </s> ... </subcorpus>