I am trying to process millions of files, spread over a tree of directories.  
At the moment I can collect the set of top level directories into a list and 
then process these in parallel using GPars with list processing (e.g. 
.eachParallel).
But what would be more efficient would be a 'parallel' for the File handling 
routines, for example:

               withPool() {
                              directory.eachFileMatchParallel (FILES, 
~/($fileMatch)/) {aFile ->  ...

then I would be a very happy bunny!

I know I could copy the list of matching files into an Array list and then use 
the withPool { filesArray.eachParallel { ... - but this does not seem like an 
efficient solution - especially if there are several hundred thousand files in 
a directory.

What design pattern(s) might be better to consider using?

Merlin Beedell

Reply via email to