HMM tagger

Description

The HMM based Tagger is an implementation of the Czech tagger developed at UFAL . In order to work, the tagger requires preprocessing by a Czech morphological module with a very high coverage. This module covers a superset of the Czech "HM" morphology. Both the morphological module and the tagger are supplied in two independent packages as binary executables, together with all necessary precompiled Czech data. Input must be in the ISO Latin 2 (iso-8859-2) code and follow the usual csts.dtd definition, and output is produced in the same way (ISO Latin 2 code, csts.dtd). (As is the case with many of the tools provided with PDT 1.0, both executables also accept – and then produce – a "simplified SGML", which is not a real, valid SGML, but simply contains at least the tags for words, punctuation, and sentence breaks, one item per line.)

Tool type

Tool for creating own tools and resources

Tool task

tagging

Key words

mono-lingual, service, text processing

Research domain

Computational Linguistics, Linguistics , Morphology, Syntax

Language

Czech

Country

Czech

CLARIN centre

Charles University in Prague

Contact person

Pavel Krbec

URL

http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Tagging/MM_tagger/index.html

Similar to