Tomasso, Olivier,
thanks for the pointers!
At this stage of the project we are not still 100% focused on this,
however this trails will provide valuable help when we do start to dig
into all this topics.
Em 06-07-2011 09:44, Tommaso Teofili escreveu:
2011/7/6 Olivier Grisel<[email protected]>
2011/7/6 Tommaso Teofili<[email protected]>:
As far as I know there have been some discussions about Portuguese models
on
OpenNLP mailing list [1] so Alex could find help about this topic there.
My 2 cents,
Tommaso
[1] : http://markmail.org/thread/tjypzqrxe4r2cdnw
Current Stanbol enhancer need:
- Sentence segmentation (available for opennlp version 1.5)
- Tokenizer (available for opennlp version 1.5 or SimpleTokenizer is
probably ok for any European language)
- NameFinder model for generic entities (People, Place, Organization)
missing and not trivial to train
- POS taggers for domain specific taxonomy annotations (already available)
We don't need nor want to use a parser model for entity / concept
extraction. It might be possible but too slow not scalable. It might
be useful later for relation extraction though.
I agree; it's more a pointer to eventually help Alex find people interested
in training Portuguese models.
Tommaso
So the main issue here is the missing NameFinder and the lag of
DBpedia portuguese labels / abstracts for the EntityHub.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel