The models are separate. They can be downloaded from http://opennlp.sourceforge.net/models-1.5/ <http://opennlp.sourceforge.net/models-1.5/> Gary Underwood [email protected]
> On Jun 29, 2017, at 8:07 PM, Ling <[email protected]> wrote: > > Hi, Jörn: > > I want to directly use openNLP, instead of deeplearning4j and UIMA. I > included the Maven 1.8 version in my POM file, then do I still need to > download the models separately? And I can't find those model files. For > example, to do a simple test on tokenization model, > > InputStream is = new FileInputStream("en-token.bin"); > > Do I have to download the en-token.bin separately? I am working in a maven > projects. Thank you. > > Ling > > > On Thu, Jun 29, 2017 at 10:42 AM, Joern Kottmann <[email protected]> wrote: > >> Long chain, yes, then you probably use the SourceForge tokenization >> model that was trained on some old news. >> >> We usually don't consider mistakes the models do as bugs because we >> can't do much about it other than suggesting to use models that fit >> your data very well and even in that case models can be wrong >> sometimes. >> >> If there is something we can do here to reduce the error rate then we >> are very happy to get that as a contribution or just pointed out. >> >> Jörn >> >> On Thu, Jun 29, 2017 at 6:54 PM, Ling <[email protected]> wrote: >>> Hi, Jörn: >>> >>> I am using a Deeplearning4j, which uses org.apache.uima library I think. >>> And then UIMA uses openNLP. Probably that's what happens. >>> >>> So it isn't openNLP's original problem? Thank you. >>> >>> Ling >>> >>> On Thu, Jun 29, 2017 at 12:30 AM, Joern Kottmann <[email protected]> >> wrote: >>> >>>> Hello, >>>> >>>> which model are you using? Did you train it yourself? >>>> >>>> Jörn >>>> >>>> On Thu, Jun 29, 2017 at 4:04 AM, Ling <[email protected]> wrote: >>>>> Hi, all: >>>>> >>>>> I am testing openNLP and found some significant tokenization issue >>>>> involving punctuation. >>>>> >>>>> Thank you Costco! >>>>> i love costco! >>>>> I love Costco!! >>>>> FUCK IKEA. >>>>> >>>>> In all these cases, the last punctuation is not split so "Costco!" and >>>>> "IKEA." are treated as one token. This looks like a systematic >> problem. >>>>> Before I file an issue on OpenNLP project, I want to make sure this >> issue >>>>> is true coming from the library. >>>>> >>>>> Does any of you encounter similar problem? Thanks. >>>> >>
