Re: Contributing OpenNLP connector

2015-11-18 Thread Karl Wright
There's another problem with: String textContent = new String(bytes); Specifically, (1) its operation will vary with the locale of the machine it's being run on, and (2) there's no limit to the amount of memory that this could conceivably require. Both are problems. If you could use a stream

Re: Contributing OpenNLP connector

2015-11-18 Thread Rafa Haro
Hi Chalitha! Awesome!. I will take a look to this as soon as possible.  Cheers, Rafa On Wed, Nov 18, 2015 at 1:22 PM, chalitha udara Perera wrote: > Hi All, > I have worked on a OpenNLP based transformation connector for some > requirement. Given a document it

Contributing OpenNLP connector

2015-11-18 Thread chalitha udara Perera
Hi All, I have worked on a OpenNLP based transformation connector for some requirement. Given a document it extracts named entities such as people, locations and organisations and add those as metadata to repository document. If you think this will be useful for the community, I would like to

Re: Contributing OpenNLP connector

2015-11-18 Thread Piergiorgio Lucidi
Hi Chalitha, first thank you so much for your work and I hope that some of us can take a look at your project to understand if it can fits with the trunk of ManifoldCF. I hope to take a look at it today I think it is very interesting but I would like to receive other feedback by the PMC. Thank

Re: Contributing OpenNLP connector

2015-11-18 Thread Karl Wright
Thanks, Chalitha, for contributing this! I hope to have a look at the code also, but it won't happen until next week I'm afraid. Karl On Wed, Nov 18, 2015 at 7:44 AM, Rafa Haro wrote: > Hi Chalitha! > > > > > Awesome!. I will take a look to this as soon as possible. >

Re: Contributing OpenNLP connector

2015-11-18 Thread Alessandro Benedetti
Hey Chal, First of all thanks you very much for the contribution! I have some observations : *Model Downloading* Taking the look to the way you provide the user with the models, I can see there is a shell script to download very specific english models. It would be great having the possibility

Re: Contributing OpenNLP connector

2015-11-18 Thread chalitha udara Perera
Hi guys, Thank you very much for comments and suggestions ! As Alessandro said, I have assumed the use of Tika connector prior to using the OpenNLP connector. I think it is a valid assumption because tika parses different sources in to common format, so the future transformation connectors can

Re: Contributing OpenNLP connector

2015-11-18 Thread chalitha udara Perera
Hi Karl, I will fix that encoding issue. Thanks, Chalitha On Thu, Nov 19, 2015 at 12:31 PM, Karl Wright wrote: > Hi Chalitha, > > My comment was about encoding, not about languages. If you are assuming > that the binary document stream is utf-8 (which will be the output

Re: Contributing OpenNLP connector

2015-11-18 Thread Karl Wright
Hi Chalitha, My comment was about encoding, not about languages. If you are assuming that the binary document stream is utf-8 (which will be the output of the Tika transformer), then you *must* specify utf-8 as the encoding when you convert it back to a string. Otherwise you will have data