Hi, Suneel , that's great. The reason was that I wanted to do something in DeepLearnig4j and happened to find that openNLP was integrated into it already. So I just used their API to call openNLP.
Is there a set date for next release? Also, are the 1.5 models the same as the models to be included in the 1.81 release? Thanks. Ling On Thu, Jun 29, 2017 at 5:30 PM, Suneel Marthi <[email protected]> wrote: > On Thu, Jun 29, 2017 at 8:07 PM, Ling <[email protected]> wrote: > > > Hi, Jörn: > > > > I want to directly use openNLP, instead of deeplearning4j and UIMA. I > > included the Maven 1.8 version in my POM file, then do I still need to > > download the models separately? And I can't find those model files. For > > example, to do a simple test on tokenization model, > > > > Dl4j is for Deep learning, OpenNLP is for text processing - not sure why > you would go to DL4J first and revert back to OpenNLP if all u want to do > is basic text processing. > > The model files (1.5 models) are presently at - > http://opennlp.sourceforge.net/models-1.5/ > > > > > > > InputStream is = new FileInputStream("en-token.bin"); > > > > Do I have to download the en-token.bin separately? I am working in a > maven > > projects. Thank you > > > Yes, the models need to be downloaded separately. > > We finally got approval from Apache Foundation to distribute OpenNLP models > thru Apache, following the upcoming 1.8.1 release we should be distributing > updated 1.8.1 models too once we hash out the details for doing that. > > > > . > > > > Ling > > > > > > On Thu, Jun 29, 2017 at 10:42 AM, Joern Kottmann <[email protected]> > > wrote: > > > > > Long chain, yes, then you probably use the SourceForge tokenization > > > model that was trained on some old news. > > > > > > We usually don't consider mistakes the models do as bugs because we > > > can't do much about it other than suggesting to use models that fit > > > your data very well and even in that case models can be wrong > > > sometimes. > > > > > > If there is something we can do here to reduce the error rate then we > > > are very happy to get that as a contribution or just pointed out. > > > > > > Jörn > > > > > > On Thu, Jun 29, 2017 at 6:54 PM, Ling <[email protected]> wrote: > > > > Hi, Jörn: > > > > > > > > I am using a Deeplearning4j, which uses org.apache.uima library I > > think. > > > > And then UIMA uses openNLP. Probably that's what happens. > > > > > > > > So it isn't openNLP's original problem? Thank you. > > > > > > > > Ling > > > > > > > > On Thu, Jun 29, 2017 at 12:30 AM, Joern Kottmann <[email protected] > > > > > wrote: > > > > > > > >> Hello, > > > >> > > > >> which model are you using? Did you train it yourself? > > > >> > > > >> Jörn > > > >> > > > >> On Thu, Jun 29, 2017 at 4:04 AM, Ling <[email protected]> wrote: > > > >> > Hi, all: > > > >> > > > > >> > I am testing openNLP and found some significant tokenization issue > > > >> > involving punctuation. > > > >> > > > > >> > Thank you Costco! > > > >> > i love costco! > > > >> > I love Costco!! > > > >> > FUCK IKEA. > > > >> > > > > >> > In all these cases, the last punctuation is not split so "Costco!" > > and > > > >> > "IKEA." are treated as one token. This looks like a systematic > > > problem. > > > >> > Before I file an issue on OpenNLP project, I want to make sure > this > > > issue > > > >> > is true coming from the library. > > > >> > > > > >> > Does any of you encounter similar problem? Thanks. > > > >> > > > > > >
