Re: MultiThreadedJLanguageTool

2015-02-22 Thread Andriy Rysin
On 02/22/2015 01:18 PM, Marcin Miłkowski wrote: > W dniu 2015-02-22 o 15:24, Andriy Rysin pisze: >> On 02/22/2015 04:45 AM, Marcin Miłkowski wrote: >>> Hi, >>> >>> >>> W dniu 2015-02-21 o 19:22, Andriy Rysin pisze: So the main problem with this performance improvement is that we read acro

Re: MultiThreadedJLanguageTool

2015-02-22 Thread amy nguyen
scussion for LanguageTool Sent: Sunday, February 22, 2015 11:00 AM Subject: Re: MultiThreadedJLanguageTool Well, the condition on overlapping wasn't changed, we only changed on how many overlapping matches are removed. So I guess in this case we either have to change the positions for e

Re: MultiThreadedJLanguageTool

2015-02-22 Thread Andriy Rysin
Well, the condition on overlapping wasn't changed, we only changed on how many overlapping matches are removed. So I guess in this case we either have to change the positions for each bracket rule (so that they don't overlap, i.e. position for first should not be 0-1 but 0-0) or give them different

Re: MultiThreadedJLanguageTool

2015-02-22 Thread Marcin Miłkowski
W dniu 2015-02-22 o 15:24, Andriy Rysin pisze: > On 02/22/2015 04:45 AM, Marcin Miłkowski wrote: >> Hi, >> >> >> W dniu 2015-02-21 o 19:22, Andriy Rysin pisze: >>> So the main problem with this performance improvement is that we read >>> across paragraphs. There are two problems with this: >>> 1) e

Re: MultiThreadedJLanguageTool

2015-02-22 Thread Jaume Ortolà i Font
2015-02-22 15:04 GMT+01:00 Andriy Rysin : > No, the only thing I pushed that will lead to regressions was remove > more than one consequitive overlapping matches in SameRuleGroupFilter > (and also make sure we remove conequitive overlaps produced by > multiple threads). The regressions above seems

Re: MultiThreadedJLanguageTool

2015-02-22 Thread Andriy Rysin
On 02/22/2015 04:45 AM, Marcin Miłkowski wrote: > Hi, > > > W dniu 2015-02-21 o 19:22, Andriy Rysin pisze: >> So the main problem with this performance improvement is that we read >> across paragraphs. There are two problems with this: >> 1) error context shows sentences from another paragraph: >>

Re: MultiThreadedJLanguageTool

2015-02-22 Thread Andriy Rysin
No, the only thing I pushed that will lead to regressions was remove more than one consequitive overlapping matches in SameRuleGroupFilter (and also make sure we remove conequitive overlaps produced by multiple threads). The regressions above seems to be all removals (the other change would actuall

Re: MultiThreadedJLanguageTool

2015-02-22 Thread Daniel Naber
On 2015-02-21 19:22, Andriy Rysin wrote: > So the main problem with this performance improvement is that we read > across paragraphs. There are two problems with this: > 1) error context shows sentences from another paragraph: > I almost worked out a solution for that by adjusting ContextTools but

Re: MultiThreadedJLanguageTool

2015-02-22 Thread Marcin Miłkowski
Hi, W dniu 2015-02-21 o 19:22, Andriy Rysin pisze: > So the main problem with this performance improvement is that we read > across paragraphs. There are two problems with this: > 1) error context shows sentences from another paragraph: > I almost worked out a solution for that by adjusting Conte

Re: MultiThreadedJLanguageTool

2015-02-21 Thread Andriy Rysin
Thanks, I've pushed suggested cleanups. Andriy 2015-02-20 8:10 GMT-05:00 Daniel Naber : > On 2015-02-19 22:16, Andriy Rysin wrote: > >> I've merged multithreading branch into master. Please try it out when >> you have a chance and let me know if you see any issues. > > Thanks. Some small cleanup

Re: MultiThreadedJLanguageTool

2015-02-21 Thread Andriy Rysin
So the main problem with this performance improvement is that we read across paragraphs. There are two problems with this: 1) error context shows sentences from another paragraph: I almost worked out a solution for that by adjusting ContextTools but then I found the next one: 2) the cross-sentence

Re: MultiThreadedJLanguageTool

2015-02-20 Thread Andriy Rysin
So before wrapping these optimizations up I decided to take a last look at the thread graph in jvisualvm and it showed that the worker threads spend more time in park state then in running. But the graph was really not showing why, it was more like a noodle soup. So I brought one of my past optimiz

Re: MultiThreadedJLanguageTool

2015-02-20 Thread Daniel Naber
On 2015-02-19 22:16, Andriy Rysin wrote: > I've merged multithreading branch into master. Please try it out when > you have a chance and let me know if you see any issues. Thanks. Some small cleanup ideas: -setThreadPoolSize should probably be a parameter of the constructor, as calling it after

Re: MultiThreadedJLanguageTool

2015-02-20 Thread Daniel Naber
On 2015-02-20 00:58, Andriy Rysin wrote: > Also with this we run SameRuleGroupFilter twice for both modes - one > time (per thread) inside performCheck() and once at the end of check() > after sorting. I feel like it's redundant and we can remove the first > one. Yes, I think so. BTW, Ukrainian

Re: MultiThreadedJLanguageTool

2015-02-19 Thread Andriy Rysin
uagetool/MultiThreadedJLanguageToolTest.java @@ -18,7 +18,6 @@ */ package org.languagetool; -import static junit.framework.TestCase.assertTrue; import static org.hamcrest.CoreMatchers.is; import static org.hamcrest.MatcherAssert.assertThat; @@ -31,7 +30,10 @@ import org.junit.Assert; import org.junit.Test; import or

Re: MultiThreadedJLanguageTool

2015-02-19 Thread Andriy Rysin
Daniel I took a look at the problem of SameRuleGroupFilter missing rules on multithreaded execution due to rules with same id being split across threads. So I've added a SameRuleGroupFilter.filter() after all threads return. But to my surprise the tests that compare single-threaded run with multit

Re: MultiThreadedJLanguageTool

2015-02-19 Thread Andriy Rysin
I've merged multithreading branch into master. Please try it out when you have a chance and let me know if you see any issues. Thanks Andriy 2015-02-18 14:10 GMT-05:00 Andriy Rysin : > That makes sense, change pushed. > > Andriy > > 2015-02-18 11:48 GMT-05:00 Daniel Naber : >> On 2015-02-18 15:19

Re: MultiThreadedJLanguageTool

2015-02-18 Thread Andriy Rysin
That makes sense, change pushed. Andriy 2015-02-18 11:48 GMT-05:00 Daniel Naber : > On 2015-02-18 15:19, Andriy Rysin wrote: > >> 2) remove it completely, but I think it would be nice to have in case >> somebody (maybe me again :)) will want to do more performance >> profiling > > What about chan

Re: MultiThreadedJLanguageTool

2015-02-18 Thread Daniel Naber
On 2015-02-18 15:19, Andriy Rysin wrote: > 2) remove it completely, but I think it would be nice to have in case > somebody (maybe me again :)) will want to do more performance > profiling What about changing the property name to something that makes clear it's internal any nobody should rely on

Re: MultiThreadedJLanguageTool

2015-02-18 Thread Andriy Rysin
So we have two options then: 1) move it to a parameter, but this is an option for the developer so I don't think this makes a lot of sense 2) remove it completely, but I think it would be nice to have in case somebody (maybe me again :)) will want to do more performance profiling E.g. I used this

Re: MultiThreadedJLanguageTool

2015-02-18 Thread Daniel Naber
On 2015-02-18 00:15, Andriy Rysin wrote: > I don't have much explanation for this so I introduced a system > property (org.languagetool.thread_count) if you want to force > different # of threads. We don't use system properties anywhere else in the core code (only once to get the temp directory,

Re: MultiThreadedJLanguageTool

2015-02-17 Thread Andriy Rysin
Ok, I worked on this a bit more and and didn't get anything as good as in the first run: as main thread reading the file and tokenizing sentence is always single-threaded I tested some improvements there 1) in commandline.Main we do call handleLine (and all the heavy processing using threads) on d

Re: MultiThreadedJLanguageTool

2015-02-16 Thread R.J. Baars
Great performance achievement! > I've pushed a new branch "multithreading" into git. There are 3 > changes right now: > 1) Don't recreate thread pool > 2) Analyze sentences in threads > 3) Optimize some code on main thread (as all coordination goes through > a main thread it is a bottleneck and an

Re: MultiThreadedJLanguageTool

2015-02-15 Thread Daniel Naber
On 2015-02-14 21:58, Andriy Rysin wrote: > I've pushed a new branch "multithreading" into git. There are 3 > changes right now: I can confirm it helps quite a bit: For German, testing now takes 4.4ms per sentence on average compared to 5.8ms before (measured with org.languagetool.rules.patterns

Re: MultiThreadedJLanguageTool

2015-02-14 Thread Andriy Rysin
I've pushed a new branch "multithreading" into git. There are 3 changes right now: 1) Don't recreate thread pool 2) Analyze sentences in threads 3) Optimize some code on main thread (as all coordination goes through a main thread it is a bottleneck and any improvement there helps a lot) On my prof

Re: MultiThreadedJLanguageTool

2015-02-12 Thread Andriy Rysin
So I've played with this a bit today and here's what I found: with 3 relatively small changes: 1) reuse thread pool rather that recreate it every time (this probably least important from performance point of view but it's easier to profile 4 worker threads than hundreds) 2) run sentence analyzer in

Re: MultiThreadedJLanguageTool

2015-02-11 Thread Daniel Naber
On 2015-02-11 05:07, Andriy Rysin wrote: > 1) it seems like we're currently creating and destorying thread pool > every time we check sentences, would it not make more sense to create > pool once and keep threads in the pool and reuse them? I think so. The number of threads should then probably b

MultiThreadedJLanguageTool

2015-02-10 Thread Andriy Rysin
I have 2 questions about MultiThreadedJLanguageTool: 1) it seems like we're currently creating and destorying thread pool every time we check sentences, would it not make more sense to create pool once and keep threads in the pool and reuse them? It probably would not improve performance muc