Poliqarp 2.0.1

Copyright (C) IPI PAN, 2013-2016. All rights reserved.
Available under the terms of the GNU General Public License;
see the COPYING file for details.

ABOUT

Poliqarp2 is a linguistic search engine simultaneously supporting large corpora 
with multilevel annotations, various tree banks and banks of structures (e.g. LFG).

Poliqarp2 homepage:
sourceforge (download and bugtracker): http://sourceforge.net/projects/poliqarp2/

Authors:
Bartosz Zaborowski [bartosz.zaborowski@ipipan.waw.pl]
Aleksander Zabłocki [olekz@mimuw.edu.pl]

REQUIREMENTS

Run-time:
* POSIX operating system (tested on Linux Ubuntu 15.10 and Fedora 22/23)
* dependencies listed below

For building Poliqarp2 from source you need also:
  GNU make or compatible build system
  binutils
  C and C++ (C++11) compiler (tested on gcc)
  Devel packages for dependencies

Dependencies (in parentheses versions that were tested):

zlib - compression library
xz-libs (liblzma) - compression library
sqlite3 - lightweight database system
boost library (at least regex and serialization)
python3 - libpython library
libhttpserver (you can get it from https://github.com/etr/libhttpserver,
               version 0.9.1 works while newer versions don't compile due to API changes)

Optional dependencies:
google perftools (tcmalloc library) - highly recommended, but some versions cause 
        random segfaults at boost library (e.g. the version from ubuntu 15.10 repository)
        - in such cases disable it by passing --without-tcmalloc to configure script.


INSTALLATION

Installation from source is similar to many other packages. In the simplest
case the shell commands `./configure; make; make install' should
configure, build, and install Poliqarp2 (the last command may require 
superuser privileges). For standard installation options see INSTALL file.

The Poliqarp2 configure script accepts the following nonstandard options:

--with[out]-tcmalloc
  This option allows to disable custom allocation methods from
  google perftools. It is optional, but recommended. It can speedup
  the Poliqarp2 by a few percent. It also may save significant amount of memory when 
  the Poliqarp2 is configured to use a large number of threads.

--with-boost-prefix=DIR
  The nonstandard path to boost libraries, similar to the above option for ICU.


BASIC USAGE

Poliqarp2 is a command line/daemon tool. The GUI client is distributed in a separate
package (see the homepage for download).

The tool consists of multiple modules.

* poliqarp_storage - a dedicated, simple database system. Should be running in background
                     for all the time while any other tool is used.
* poliqarp_reader - corpus reading tool. The input files should be in the .pqz format.
* poliqarp_indexer - corpus indexing tool, processes data loaded into the database by the reader.
* poliqarp_server - search engine server, operates on corpora previously loaded and optionally indexed.

All the above modules accept --help and -h arguments, for which they show invocation documentation.

All the above modules use a common configuration. By default it is searched in:
$HOME/.poliqarp/poliqarp2.conf

The configuration file is in a JSON format. The example configuration, self-documented
in comments, can be found in examples/ subdirectory in the Poliqarp2 package.

For more detailed documentation see below.


DOCUMENTATION

All the documentation is distributed in a separate package (see the homepage for download).

The end-user's documentation is included in the doc package in user_manual.pdf file.

The operator's documentation, describing administration, corpora importing and
other technical issues is included in the doc package as operator_manual.pdf.


EXAMPLES

See examples directory in the package for an example of global configuration and
per-corpus config (self explained in comments).


FOR DEVELOPERS

The Poliqarp2 search engine exposes a REST API, so it can be easily attached to other tools.
The API is described in api.pdf file in the doc package (in Polish only, sorry).

The Poliqarp2 storage engine (file organization and communication protocol) 
is briefly documented in poliqarp_storage.txt in the doc package (in Polish only, sorry).

Feel free to play around with the sources, modify them and post patches on 
Poliqarp2's bugtracker at sourceforge.


