Hello! Thank you for the detailed answer.
PS> Thanks for looking at this. Judging by the results i am guessing PS> your measurements were performed on a 4-core system. Yes, quad-core, I mentioned it before. PS> My initial inclination for these scenarios is it is likely better PS> for the developer to split the execution in two parts. The first PS> part, low N and low Q, is to collect sequentially into an array or PS> list. The second part (total size is known), low N and high Q, PS> would then use that array/list as the stream source and operate in parallel. Probably it would be fine to add such recommendations to the corresponding methods javaDoc (for example, BufferedReader.lines). PS> That, in a sense, is almost what your spliterator does for n < PS> initial batch size -:) but because the target threshold size is PS> derived from the unknown root size (Long.MAX_VALUE) the PS> sub-spliterators are not themselves further split, so the PS> parallelism is better but still limited. It's tricky to adjust the PS> target size dynamically, we could do it for the first split (e.g. PS> unknown size -> two known size), but tracking further splits is PS> harder and may not reliably perform (due to concurrent PS> sub-splitting). I generally don't want to complicate the PS> computation mechanism for poor splitting use-cases. I see. That actually was also the thing which I wanted to suggest. Still even improving the first split would be great, I'd vote for it. This way AbstractSpliterator user implementations may also benefit. My optimization cannot be applied to AbstractSpliterator as we cannot control the tryAdvance method. PS> Your approach also degenerates when N is just over the current batch size. Yes, it becomes bad, but my measurement show that it's still no worse than current implementation: Benchmark (parallel) (size) Mode Cnt Score Error Units IteratorTest.base false 1028 avgt 30 659.308 ± 12.618 us/op IteratorTest.base false 1100 avgt 30 746.619 ± 21.260 us/op IteratorTest.base true 1028 avgt 30 194.800 ± 1.988 us/op IteratorTest.base true 1100 avgt 30 228.926 ± 2.923 us/op IteratorTest.jdkSpliterator false 1028 avgt 30 648.622 ± 3.857 us/op IteratorTest.jdkSpliterator false 1100 avgt 30 750.741 ± 10.279 us/op IteratorTest.jdkSpliterator true 1028 avgt 30 673.574 ± 6.469 us/op IteratorTest.jdkSpliterator true 1100 avgt 30 679.499 ± 4.310 us/op IteratorTest.optimized false 1028 avgt 30 643.209 ± 1.686 us/op IteratorTest.optimized false 1100 avgt 30 718.077 ± 8.123 us/op IteratorTest.optimized true 1028 avgt 30 674.447 ± 5.153 us/op IteratorTest.optimized true 1100 avgt 30 674.622 ± 4.252 us/op PS> It does not make as much sense to include such an estimate PS> argument for Files.lines for two reasons: PS> 1) I recently improved Files.lines for common character sets PS> where it is efficient to identify encoded line feeds (such as UTF-8); and PS> 2) for other character sets we could derive an estimate from the PS> file size (it could actually be the file size, which is the case for 1). I see. FileChannelLinesSpliterator is a nice improvement. Have you tested it on device files, named pipes or cases when the file size is changed in-between by some other process? Also probably it should be mentioned in BufferedReader.lines documentation, that using Files.lines for file source is preferred. PS> So i am inclined to do the following: PS> 1) consider adding a size estimate argument for all stream PS> factories that are of unknown size and where no reasonable estimate can be derived. PS> 2) modify the spliterator factories in Spliterators to accept an estimate e.g.: PS> public static <T> Spliterator<T> PS> spliteratorUnknownSize(Iterator<? extends T> iterator, PS> long sizeEstimate, PS> int characteristics) By the way currently it's possible to create an IteratorSpliterator with estimated size: Spliterators.spliterator(iterator, estimatedSize, Spliterator.CONCURRENT); Of course it's a misuse of CONCURRENT flag, but it's not used otherwisely by stream API, thus currently it has no unwanted side-effects. With best regards, Tagir Valeev. PS> 3) Possibly improve the final split logic of implementations in PS> Spliterators so that it benefits known size, estimated size [*], and maybe unknown size. PS> Paul. PS> [*] if the known or estimated size remaining is less than the PS> next batch size, then split in half as the final splitting action. PS> In this case there is no need for an array field since the iterator can be reused. PS> On Jul 10, 2015, at 8:01 PM, Tagir F. Valeev <amae...@gmail.com> wrote: >> Hello! >> >> The problem of ineffective parallelization of spliteratorUnknownSize >> for low-N/high-Q tasks was already discussed in the thread "Exploiting >> concurrency with IteratorSpliterator of unknown size" >> [ http://mail.openjdk.java.net/pipermail/lambda-dev/2014-March/011968.html ] >> It was proposed to modify some heuristics and/or add some parameters >> to control them. >> >> I have a simpler idea which allows to run in parallel high-Q tasks for >> N <~ 3000 about twice faster without any API changes while having no >> visible impact for high-N or sequential cases. Here's the gist with >> proof-of-concept implementation (UnknownSizeSpliterator.java), JMH >> benchmark (IteratorTest.java) and benchmarking results (performed >> on Quad-Core i5-3340, Win7 64bit): >> https://gist.github.com/amaembo/781c64a3b4f48fdeb196 >> >> The problem which I noticed is the following. When trySplit is >> requested and JDK IteratorSpliterator fills the buffer, it may >> suddenly discover that the source iterator has no more elements. At >> this point the IteratorSpliterator splits in very uneven manner. >> Namely it dumps everything into the ArraySpliterator leaving no >> elements for itself. As it did not report the size previously, the >> pipeline engine assumes that both parts are roughly equal which >> result in very poor performance. >> >> I merged both IteratorSpliterator and ArraySpliterator into single >> class UnknownSizeSpliterator. When it sees that input spliterator has >> no more elements, it converts itself into array-mode and splits right >> here to two equal parts, thus the first call of trySplit for N < 1024 >> performs actual split. >> >> Note that I did not change the heuristics at all, thus for big-N >> tasks the results should be the same. My implementation is a little >> bit simplified (no additional characteristics, no precondition >> checks, no primitive implementations), but I guess current version is >> ok for preliminary testing to decide whether it's good or bad. >> >> This implementation might improve the performance of existing methods >> like BufferedReader.lines, Files.lines, Pattern.splitAsStream when >> using the parallel stream as well as user-defined iterator-based >> spliterators. Also it maybe useful for JDK-8072727 feature >> [ https://bugs.openjdk.java.net/browse/JDK-8072727 ]. >> >> What do you think? >> >> With best regards, >> Tagir Valeev. >>