Hello Jörn,

Yes, I have a lot of code around TCF, I will see how it can be
integrated. AT least, I'll need importers/exporters for OpenNLP/TCF
anyway :-)

Best,

Tom



Am 15.10.2013 10:06, schrieb Jörn Kottmann:
> OpenNLP is designed to support many formats for training, but we had to
> decide
> on one default format, and that is the one which was always supported.
> 
> We can support the proposed TCF Format, are you interested to contribute
> parsing code for it?
> 
> Jörn
> 
> On 10/14/2013 09:59 PM, Thomas Zastrow wrote:
>> Hello,
>>
>> In any case, I think its a little bit oldschool to identify tokens and
>> additional annotations just with spaces between them ... what about a
>> nice XML format (no, not that ISO crap .. what about TCF [1])? Or maybe
>> NEGRA?
>>
>> Best,
>>
>> Tom
>>
>> [1]
>> http://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/The_TCF_Format
>>
>>
>>
>> Am 14.10.2013 21:53, schrieb Charles Martin:
>>> What happens if all the entity tokens are at the beginning of every
>>> line?
>>> I find that openlp then thinks that any string near the beginning of
>>> a line
>>> is an entity,
>>> regardless of the content or word context
>>>
>>>
>>>
>>> On Mon, Oct 14, 2013 at 12:48 PM, Thomas Zastrow
>>> <[email protected]>wrote:
>>>
>>>> Thanks. That explains a lot ... :-)
>>>>
>>>> Does it play a role it it is one or two blanks?
>>>>
>>>>
>>>>
>>>> Am 14.10.2013 21:44, schrieb William Colen:
>>>>> Yes, it does. Include a blank between any element, including
>>>>> punctuations
>>>>> and annotations. The corpus must be tokenized.
>>>>>
>>>>>
>>>>> 2013/10/14 Thomas Zastrow <[email protected]>
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a question: when creating training material, does it make a
>>>>>> difference if there are " " (blanks) around the NE? In other
>>>>>> words, is
>>>>>> it the same to have:
>>>>>>
>>>>>> <START:loc>Hamburg<END>
>>>>>>
>>>>>> or:
>>>>>>
>>>>>> <START:loc> Hamburg <END>
>>>>>>
>>>>>> The example in the documentation shows up with the " " ... ?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> P.S.: ca. 1300 sentences for a free German NE model are done :-)
>>>>>>
>>>>
>>>
> 

Reply via email to