Hi, Jörn: I am using a Deeplearning4j, which uses org.apache.uima library I think. And then UIMA uses openNLP. Probably that's what happens.
So it isn't openNLP's original problem? Thank you. Ling On Thu, Jun 29, 2017 at 12:30 AM, Joern Kottmann <[email protected]> wrote: > Hello, > > which model are you using? Did you train it yourself? > > Jörn > > On Thu, Jun 29, 2017 at 4:04 AM, Ling <[email protected]> wrote: > > Hi, all: > > > > I am testing openNLP and found some significant tokenization issue > > involving punctuation. > > > > Thank you Costco! > > i love costco! > > I love Costco!! > > FUCK IKEA. > > > > In all these cases, the last punctuation is not split so "Costco!" and > > "IKEA." are treated as one token. This looks like a systematic problem. > > Before I file an issue on OpenNLP project, I want to make sure this issue > > is true coming from the library. > > > > Does any of you encounter similar problem? Thanks. >
