Re: Is this a typical OpenNLP tokenization issue?

Joern Kottmann Thu, 29 Jun 2017 00:31:05 -0700

Hello,

which model are you using? Did you train it yourself?


Jörn

On Thu, Jun 29, 2017 at 4:04 AM, Ling <[email protected]> wrote:
> Hi, all:
>
> I am testing openNLP and found some significant tokenization issue
> involving punctuation.
>
> Thank you Costco!
> i love costco!
> I love Costco!!
> FUCK IKEA.
>
> In all these cases, the last punctuation is not split so "Costco!" and
> "IKEA." are treated as one token. This looks like a systematic problem.
> Before I file an issue on OpenNLP project, I want to make sure this issue
> is true coming from the library.
>
> Does any of you encounter similar problem? Thanks.

Re: Is this a typical OpenNLP tokenization issue?

Reply via email to