2011/7/6 Olivier Grisel <[email protected]>

> 2011/7/6 Tommaso Teofili <[email protected]>:
> > As far as I know there have been some discussions about Portuguese models
> on
> > OpenNLP mailing list [1] so Alex could find help about this topic there.
> > My 2 cents,
> > Tommaso
> >
> > [1] : http://markmail.org/thread/tjypzqrxe4r2cdnw
>
>
> Current Stanbol enhancer need:
>
> - Sentence segmentation (available for opennlp version 1.5)
> - Tokenizer (available for opennlp version 1.5 or SimpleTokenizer is
> probably ok for any European language)
> - NameFinder model for generic entities (People, Place, Organization)
> missing and not trivial to train
> - POS taggers for domain specific taxonomy annotations (already available)
>
> We don't need nor want to use a parser model for entity /  concept
> extraction. It might be possible but too slow not scalable. It might
> be useful later for relation extraction though.
>

I agree; it's more a pointer to eventually help Alex find people interested
in training Portuguese models.
Tommaso


> So the main issue here is the missing NameFinder and the lag of
> DBpedia portuguese labels / abstracts for the EntityHub.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>

Reply via email to