Re: Training NEs?

Jörn Kottmann Tue, 15 Oct 2013 01:07:52 -0700

OpenNLP is designed to support many formats for training, but we had todecide

on one default format, and that is the one which was always supported.


We can support the proposed TCF Format, are you interested to contribute
parsing code for it?

Jörn

On 10/14/2013 09:59 PM, Thomas Zastrow wrote:

Hello,

In any case, I think its a little bit oldschool to identify tokens and
additional annotations just with spaces between them ... what about a
nice XML format (no, not that ISO crap .. what about TCF [1])? Or maybe
NEGRA?

Best,

Tom

[1]
http://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/The_TCF_Format


Am 14.10.2013 21:53, schrieb Charles Martin:

What happens if all the entity tokens are at the beginning of every line?
I find that openlp then thinks that any string near the beginning of a line
is an entity,
regardless of the content or word context



On Mon, Oct 14, 2013 at 12:48 PM, Thomas Zastrow <[email protected]>wrote:

Thanks. That explains a lot ... :-)

Does it play a role it it is one or two blanks?



Am 14.10.2013 21:44, schrieb William Colen:

Yes, it does. Include a blank between any element, including punctuations
and annotations. The corpus must be tokenized.


2013/10/14 Thomas Zastrow <[email protected]>

Hello,

I have a question: when creating training material, does it make a
difference if there are " " (blanks) around the NE? In other words, is
it the same to have:

<START:loc>Hamburg<END>

or:

<START:loc> Hamburg <END>

The example in the documentation shows up with the " " ... ?

Best,

Tom

P.S.: ca. 1300 sentences for a free German NE model are done :-)

Re: Training NEs?

Reply via email to