2011/7/6 Olivier Grisel <[email protected]> > 2011/7/6 Tommaso Teofili <[email protected]>: > > As far as I know there have been some discussions about Portuguese models > on > > OpenNLP mailing list [1] so Alex could find help about this topic there. > > My 2 cents, > > Tommaso > > > > [1] : http://markmail.org/thread/tjypzqrxe4r2cdnw > > > Current Stanbol enhancer need: > > - Sentence segmentation (available for opennlp version 1.5) > - Tokenizer (available for opennlp version 1.5 or SimpleTokenizer is > probably ok for any European language) > - NameFinder model for generic entities (People, Place, Organization) > missing and not trivial to train > - POS taggers for domain specific taxonomy annotations (already available) > > We don't need nor want to use a parser model for entity / concept > extraction. It might be possible but too slow not scalable. It might > be useful later for relation extraction though. >
I agree; it's more a pointer to eventually help Alex find people interested in training Portuguese models. Tommaso > So the main issue here is the missing NameFinder and the lag of > DBpedia portuguese labels / abstracts for the EntityHub. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel >
