Tomasso, Olivier,

thanks for the pointers!

At this stage of the project we are not still 100% focused on this, however this trails will provide valuable help when we do start to dig into all this topics.

Em 06-07-2011 09:44, Tommaso Teofili escreveu:
2011/7/6 Olivier Grisel<[email protected]>

2011/7/6 Tommaso Teofili<[email protected]>:
As far as I know there have been some discussions about Portuguese models
on
OpenNLP mailing list [1] so Alex could find help about this topic there.
My 2 cents,
Tommaso

[1] : http://markmail.org/thread/tjypzqrxe4r2cdnw


Current Stanbol enhancer need:

- Sentence segmentation (available for opennlp version 1.5)
- Tokenizer (available for opennlp version 1.5 or SimpleTokenizer is
probably ok for any European language)
- NameFinder model for generic entities (People, Place, Organization)
missing and not trivial to train
- POS taggers for domain specific taxonomy annotations (already available)

We don't need nor want to use a parser model for entity /  concept
extraction. It might be possible but too slow not scalable. It might
be useful later for relation extraction though.


I agree; it's more a pointer to eventually help Alex find people interested
in training Portuguese models.
Tommaso


So the main issue here is the missing NameFinder and the lag of
DBpedia portuguese labels / abstracts for the EntityHub.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel


Reply via email to