Re: Is this a typical OpenNLP tokenization issue?

Ling Thu, 29 Jun 2017 09:54:45 -0700

Hi, Jörn:

I am using a Deeplearning4j, which uses org.apache.uima library I think.
And then UIMA uses openNLP. Probably that's what happens.


So it isn't openNLP's original problem? Thank you.

Ling

On Thu, Jun 29, 2017 at 12:30 AM, Joern Kottmann <[email protected]> wrote:

> Hello,
>
> which model are you using? Did you train it yourself?
>
> Jörn
>
> On Thu, Jun 29, 2017 at 4:04 AM, Ling <[email protected]> wrote:
> > Hi, all:
> >
> > I am testing openNLP and found some significant tokenization issue
> > involving punctuation.
> >
> > Thank you Costco!
> > i love costco!
> > I love Costco!!
> > FUCK IKEA.
> >
> > In all these cases, the last punctuation is not split so "Costco!" and
> > "IKEA." are treated as one token. This looks like a systematic problem.
> > Before I file an issue on OpenNLP project, I want to make sure this issue
> > is true coming from the library.
> >
> > Does any of you encounter similar problem? Thanks.
>

Re: Is this a typical OpenNLP tokenization issue?

Reply via email to