Re: TokenizerTrainer

Andreas Niekler Thu, 14 Mar 2013 10:16:57 -0700

Maybe just a stupid idea but is it not possible to just use my
whitespace training data and just add one <SPLIT> tag somewhere where it
makes sense. The tonenizer just needs the feature and all the
separations are already made. Abbreviations are not separated in that
file so that it should learn those examles without any further annotation.


But i'm not sure



Am 14.03.2013 14:50, schrieb Jörn Kottmann:
> On 03/14/2013 02:15 PM, Andreas Niekler wrote:
>> Hello,
>>
>> seems that this issue is already opened by you:
>> https://issues.apache.org/jira/browse/OPENNLP-501
>>
>> Shoul i include that into 1.6.0 or just the trunk?
> 
> Leave the version open, it would probably be nice to pull that
> fix into 1.5.3, but it depends on how quick we get it and what
> the other committers think about it, so can't promise anything here.
> If it will not go into 1.5.3 it will definitely go into the version after.
> 
> Jörn
> 

-- 
Andreas Niekler, Dipl. Ing. (FH)
NLP Group | Department of Computer Science
University of Leipzig
Johannisgasse 26 | 04103 Leipzig

mail: [email protected]

Re: TokenizerTrainer

Reply via email to