On 06/27/2012 03:34 PM, Nicolas Hernandez wrote:
Since I like to use the maltparser [1], I need now to adapt the ftb to
a tag set called ftb+ as described by [2].
(only multi-word expressions which can be recognized by regular
expression are considered, some pos tags result in the concatenation
of the cat and subcat attributes...)

I plan to do it by processing the MarkupAnnotations provided by the
Tika MarkupAnnotator [3].

Do you plan to use the UIMA POS Tagger Trainer to produce a model for you?

We could add support for ftb+ tag sets to the code that produces training data
out of the FTB.

With the separator support in the training format that should work out fine in the end.
The CLI tools also give you easy access to the build-in evaluation.

Jörn

Reply via email to