[GitHub] opennlp pull request #47: OPENNLP-932: Use checkstyle suppression instead of...

2017-01-11 Thread kottmann
GitHub user kottmann opened a pull request: https://github.com/apache/opennlp/pull/47 OPENNLP-932: Use checkstyle suppression instead of mvn exclude You can merge this pull request into a Git repository by running: $ git pull https://github.com/kottmann/opennlp OPENNLP-932 Al

Thread-safe versions of some of the tools

2017-01-11 Thread Thilo Goetz
Hi, in a recent project, I was using SentenceDetectorME, TokenizerME and POSTaggerME. It turns out that none of those is thread safe. This is because the classification probabilities for the last tag() call (for example) are stored in a member variable and can be retrieved by a separate API c

[GitHub] opennlp pull request #47: OPENNLP-932: Use checkstyle suppression instead of...

2017-01-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/47 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] opennlp pull request #48: OPENNLP-719: changes NameFinder type to override a...

2017-01-11 Thread wcolen
GitHub user wcolen opened a pull request: https://github.com/apache/opennlp/pull/48 OPENNLP-719: changes NameFinder type to override any annotation The previous PR was only overriding if the span type was null, now it override if the parameter is set. See OPENNLP-719 You c

[GitHub] opennlp pull request #44: OPENNLP-137 - Training cmd line tools should measu...

2017-01-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/44 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
Hello Thilo, I am interested in your opinion about how this is done currently. We say: "Share the model between threads and create one instance of the component per thread". Wouldn't that work well in your use case? Jörn On Wed, Jan 11, 2017 at 11:05 AM, Thilo Goetz wrote: > Hi, > > in a re

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Thilo Goetz
Correct me if I'm wrong, but that approach only works if you control the thread creation yourself. In my case, for example, I was using Scala's parallel collection API, and had no control over the threading. I will usually want to create one service that does tokenization or POS tagging or what

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
+1 to make SentenceDectorME and TokenizerME thread safe and everything else where it works out for us. Making it thread safe only makes sense if you can get the throughput almost multiplied by using more cores. This works with the current model. For the POSTagger we would have to change the API a

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Cohan Sujay Carlos
Control over threading is not required to "share the model between threads and create one instance of the component per thread". One could use a scope where variable references are guaranteed to be stored in the call stack (say method-local variables in Java). You could then: a) Instantiate the

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Cohan Sujay Carlos
I meant: a) Instantiate the components in the local scope that leads to their references being in the call (thread) stack. On Wed, Jan 11, 2017 at 8:33 PM, Cohan Sujay Carlos wrote: > Control over threading is not required to "share the model between > threads and create one instance of the c

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Thilo Goetz
You can do all sorts of things. I implemented a version now that uses ThreadLocals. Works fine, but quite frankly, it's a pain in the butt. The world has been moving to multi-threaded for a long time now, and I think it's a very reasonable assumption that a simple tool like a POS tagger is thre

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Russ, Daniel (NIH/CIT) [E]
Hi, I am little confused. Why do you want to share an instance of a SentenceDetectorME across threads? Are you documents very long single sentences? I don’t think there is enough work for the SentenceDetectorME to make up the cost of multithreading on 4 cores. Previously, I had multipl

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
On Wed, 2017-01-11 at 11:05 +0100, Thilo Goetz wrote: > in a recent project, I was using SentenceDetectorME, TokenizerME and  > POSTaggerME. It turns out that none of those is thread safe. This is  > because the classification probabilities for the last tag() call > (for  > example) are stored in a

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
+1 ease of use is important for us and has always been a strong focus here. Jörn On Wed, 2017-01-11 at 17:39 +0100, Thilo Goetz wrote: > You can do all sorts of things. I implemented a version now that > uses  > ThreadLocals. Works fine, but quite frankly, it's a pain in the > butt.  > The world

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
On Wed, 2017-01-11 at 17:14 +, Russ, Daniel (NIH/CIT) [E] wrote: > Hi, > >    I am little confused. Why do you want to share an instance of a > SentenceDetectorME across threads? Are you documents very long single > sentences? I don’t think there is enough work for the > SentenceDetectorME to

[GitHub] opennlp pull request #49: OPENNLP-923: Wrap all lines longer than 110 chars

2017-01-11 Thread kottmann
GitHub user kottmann opened a pull request: https://github.com/apache/opennlp/pull/49 OPENNLP-923: Wrap all lines longer than 110 chars And also add checkstyle enforcement You can merge this pull request into a Git repository by running: $ git pull https://github.com/kottmann/o

[GitHub] opennlp pull request #48: OPENNLP-719: Override any name type with specified...

2017-01-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/48 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] opennlp pull request #49: OPENNLP-923: Wrap all lines longer than 110 chars

2017-01-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/49 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] opennlp pull request #50: OPENNLP-930: [WIP Don't Merge] Write test for Rege...

2017-01-11 Thread smarthi
GitHub user smarthi opened a pull request: https://github.com/apache/opennlp/pull/50 OPENNLP-930: [WIP Don't Merge] Write test for RegexNameFinderFactory You can merge this pull request into a Git repository by running: $ git pull https://github.com/smarthi/opennlp OPENNLP-930

[GitHub] opennlp pull request #51: OPENNLP-923: Wrap all lines longer than 110 chars

2017-01-11 Thread kottmann
GitHub user kottmann opened a pull request: https://github.com/apache/opennlp/pull/51 OPENNLP-923: Wrap all lines longer than 110 chars You can merge this pull request into a Git repository by running: $ git pull https://github.com/kottmann/opennlp OPENNLP-923-2 Alternatively