ABOUT
This package contains some input data converters for Poliqarp2. They
allow to convert to Poliqarp2 .pqz format data from:

* xle_reader.py: output format of XLE (an LFG parser), the format is prolog (.pl)
* skladnica_reader.py: constituency forests from Skladnica treebank, the format is XML
* tei_reader.py: some TEI flavors used in IPI PAN (ICS PAS), in particular:
    - NKJP (National Corpus of Polish)
    - PSC (Polish Sejm Corpus)
    - PCC (Polish Coreference Corpus)


Poliqarp2 homepage:
sourceforge (download and bugtracker): http://sourceforge.net/projects/poliqarp2/

Authors:
Bartosz Zaborowski [bartosz.zaborowski@ipipan.waw.pl]
Aleksander Zabłocki [olekz@mimuw.edu.pl]


BASIC USAGE
Every converter prints its short description of invocation when run without arugments.
The basic usage for all of them is:

./<converter_name>.py <input/path> <output/path>


NOTES
* When using tei_reader with -s parameter please pay attention to an amount of
  data processed. When converting corpus, this tool will create as many .pqz files
  as there are paragraphs in the input, which in large corpora may exceed allowed
  amount of files in a single directory on some file systems (eg. 11-12mln for ext4).
  It is better to divide invocation into a few parts with different output directories.
* Due to a nature of Python language interpreter, it is faster to run converter in 
  separate invocations for a few input files per invocation instead of running many 
  files with a single invocation. This approach also allows to convert many documents
  in parallel (converters are not multithreaded).

DOCUMENTATION
The conversion is described in more details in an operator_manual.pdf in the 
documentation package, which is available from Poliqarp2 homepage. This document describes
also the .pqz format and hints on writing a new format converter.


