Did a mistake, should be: The quick brown fox jumps over the lazy dog
I will encode my training sentence in one line as: The quick brown <LF> fox jumps over the lazy dog <LF> Eventhough I am not sure if I can avoid the line space after dog so swiching to The quick brown <LF> fox jumps over the lazy dog<LF> lg Markus 2017-09-28 9:21 GMT+02:00 Markus Kreuzthaler <[email protected]>: > Hi William! > > I found this issue which was obviously fixed: > https://issues.apache.org/jira/browse/OPENNLP-602 > > So when I have a sentence like: > > The quick brown > fox jumps over the lazy dog > > I will encode my training sentence in one line as: > > The quick brown fox <LF> jumps over the lazy dog <LF> > > Eventhough I am not sure if I can avoid the line space after dog so > swiching to > > The quick brown fox <LF> jumps over the lazy dog<LF> > > I will give it a try, or maybe someone can give me a hint which version is > correct... > > Thank you! > > lg Markus > > > 2017-09-27 17:44 GMT+02:00 William Colen <[email protected]>: > >> Sentence detector will have a bad time learning from samples without EOS >> (end of sentence) mark. This is common in headlines of articles, for >> example. >> I usually remove from the training/evaluating corpus sentences with no >> clear EOS. >> During runtime, I apply some code to split sentences in new lines if I can >> clear identify it as a complete headline. >> >> >> Regards >> William >> >> 2017-09-27 11:10 GMT-03:00 Gary Underwood <[email protected]>: >> >> > The sentences for training are in the format of 1 per line so it should >> be >> > fine as it is (unless you have sentences that also span lines). >> > >> > Gary Underwood >> > [email protected] >> > >> > >> > >> > > On Sep 27, 2017, at 6:49 AM, Markus Kreuzthaler < >> > [email protected]> wrote: >> > > >> > > Hello! >> > > >> > > How do I have to prepare the training data for sentence detection >> when I >> > > have cases where sentences end just via a new line char, without e.g. >> a >> > > period character / full stop at the end of the training sentence. >> > > >> > > Is there some special encoding for this case? >> > > >> > > Thank you for you help! >> > > >> > > lg Markus >> > >> > >> > >
