2011/7/6 Tommaso Teofili <[email protected]>: > As far as I know there have been some discussions about Portuguese models on > OpenNLP mailing list [1] so Alex could find help about this topic there. > My 2 cents, > Tommaso > > [1] : http://markmail.org/thread/tjypzqrxe4r2cdnw
Current Stanbol enhancer need: - Sentence segmentation (available for opennlp version 1.5) - Tokenizer (available for opennlp version 1.5 or SimpleTokenizer is probably ok for any European language) - NameFinder model for generic entities (People, Place, Organization) missing and not trivial to train - POS taggers for domain specific taxonomy annotations (already available) We don't need nor want to use a parser model for entity / concept extraction. It might be possible but too slow not scalable. It might be useful later for relation extraction though. So the main issue here is the missing NameFinder and the lag of DBpedia portuguese labels / abstracts for the EntityHub. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
