2011/7/6 Tommaso Teofili <[email protected]>:
> As far as I know there have been some discussions about Portuguese models on
> OpenNLP mailing list [1] so Alex could find help about this topic there.
> My 2 cents,
> Tommaso
>
> [1] : http://markmail.org/thread/tjypzqrxe4r2cdnw


Current Stanbol enhancer need:

- Sentence segmentation (available for opennlp version 1.5)
- Tokenizer (available for opennlp version 1.5 or SimpleTokenizer is
probably ok for any European language)
- NameFinder model for generic entities (People, Place, Organization)
missing and not trivial to train
- POS taggers for domain specific taxonomy annotations (already available)

We don't need nor want to use a parser model for entity /  concept
extraction. It might be possible but too slow not scalable. It might
be useful later for relation extraction though.

So the main issue here is the missing NameFinder and the lag of
DBpedia portuguese labels / abstracts for the EntityHub.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Reply via email to