Maybe just a stupid idea but is it not possible to just use my whitespace training data and just add one <SPLIT> tag somewhere where it makes sense. The tonenizer just needs the feature and all the separations are already made. Abbreviations are not separated in that file so that it should learn those examles without any further annotation.
But i'm not sure Am 14.03.2013 14:50, schrieb Jörn Kottmann: > On 03/14/2013 02:15 PM, Andreas Niekler wrote: >> Hello, >> >> seems that this issue is already opened by you: >> https://issues.apache.org/jira/browse/OPENNLP-501 >> >> Shoul i include that into 1.6.0 or just the trunk? > > Leave the version open, it would probably be nice to pull that > fix into 1.5.3, but it depends on how quick we get it and what > the other committers think about it, so can't promise anything here. > If it will not go into 1.5.3 it will definitely go into the version after. > > Jörn > -- Andreas Niekler, Dipl. Ing. (FH) NLP Group | Department of Computer Science University of Leipzig Johannisgasse 26 | 04103 Leipzig mail: [email protected]
