I have a task (script) that involves processing hundreds of thousands of files
- which I read in from the Files stream. To make it perform well, the
processing part is multi-threaded.
I wonder if any of these stream gatherers may simplify the code? Due to the
very large number of files, I can't put them all into a collection. And I
suspect that creating a virtual thread for each file would exceed limits too.
Here I use a 'synchronized' method for each thread to grab the next file from
the stream. I'm not sure if this is the most appropriate usage pattern, though
it works quite well. I tried using GPars too, but I couldn't see how to share
the stream effectively. This is a very trimmed version of what I do:
['opts' here are the command line params via the cliBuilder]
pListStream = Files.walk(opts.i.toPath()).filter(Files::isRegularFile).filter(p
-> p.fileName.toString() ==~ fileMatch).iterator()
synchronized File nextF () {
if (pListStream.hasNext()) {
.. RETURN A FILE or null to signify the end
}
// Start the threads to process and fill the queue
def futures = (1..opts.t).collect { threadNum ->
t = new Thread(new convertFiles(this, threadNum - 1, opts))
t.start()
t
}
// detect ctrl-c so we can print the stats so far before stopping
def withInteruptionListener = { Closure cloj, Closure onInterrupt ->
def thread = { onInterrupt?.call() } as Thread
Runtime.runtime.addShutdownHook (thread)
cloj();
Runtime.runtime.removeShutdownHook (thread)
}
class convertFiles implements Runnable {
void run() {
try {
while (tnefFile = parent.nextF()) {
PROCESS FILE..
}
Merlin Beedell
-----Original Message-----
From: Paul King <[email protected]>
Sent: 24 March 2025 03:11
To: [email protected]
Subject: Re: withIndex for streams
For interest, the code using Gatherers4J looks like:
assert names.stream()
.gather(Gatherers4j.filterIndexed {index, element -> index ==
3 }) // JDK24
.findFirst().get() == 'arne'
You can also use something like this using vanilla streams:
assert names.stream().skip(3).limit(1).findFirst().get() == 'arne'
I also forgot about Tim Yates' library:
https://timyates.github.io/groovy-stream/
It has various "withIndex" methods, e.g. mapWithIndex, zipWithIndex,
filterWithIndex, flatMapWithIndex, untilWithIndex, tapWithIndex,
tapEveryWithIndex.
With groovy-stream, you'd do:
assert Stream.from(names).filterWithIndex{ n, i -> i == 3 }.toList()[0] ==
'arne'
You pose a good question though about whether functionality like this should be
brought into the main Groovy modules.
Cheers, Paul.
On Sun, Mar 23, 2025 at 1:52 PM Paul King <[email protected]> wrote:
>
> It might be worth exploring this. I'll note that gatherers (JDK 24)
> provide a hook for adding such functionality in Java. Gatherers4j has
> withIndex (though we'd likely implement it differently):
>
> https://tginsberg.github.io/gatherers4j/gatherers/sequence-operations/
> withindex/
>
> As well as a bunch of other "index" operations.
>
> Paul.
>
> On Fri, Mar 21, 2025 at 2:00 AM Per Nyfelt <[email protected]> wrote:
> >
> > Hi ,
> >
> >
> >
> > I suggest that the withIndex method in DefaultGroovyMethods is
> > overloaded with an option to support streams as well
> >
> >
> >
> > Given
> >
> >
> >
> > names = ['per', 'karin', 'tage', 'arne', 'sixten', 'ulrik']
> >
> >
> >
> > I can find the 4:th element with
> >
> > println names[3]
> >
> >
> >
> > or if I only have an iterator with
> >
> > println names.iterator().withIndex().find { it, idx -> 3 == idx }[0]
> >
> >
> >
> > For a stream I can do it by using a
> > java.util.concurrent.atomic.AtomicInteger:
> >
> > AtomicInteger index = new AtomicInteger()
> >
> > println names.stream().find(n -> 3 == index.getAndIncrement())
> >
> >
> >
> > I have seen that libraries more and more often will expose a stream
> > api rather than a Collection or an Iterator so it would be very nice
> > if I could just do
> >
> > println names.stream().withIndex().find { it, idx -> 3 == idx}[0]
> >
> >
> >
> > or alternatively add a findWithIndex so that this would be possible:
> >
> > println names.stream().findWithIndex { it, idx -> 3 == idx}
> >
> >
> >
> > What do you think?
> >
> >
> >
> > Regards,
> >
> > Per