I am trying to process millions of files, spread over a tree of directories.
At the moment I can collect the set of top level directories into a list and
then process these in parallel using GPars with list processing (e.g.
.eachParallel).
But what would be more efficient would be a 'parallel' for the File handling
routines, for example:
withPool() {
directory.eachFileMatchParallel (FILES,
~/($fileMatch)/) {aFile -> ...
then I would be a very happy bunny!
I know I could copy the list of matching files into an Array list and then use
the withPool { filesArray.eachParallel { ... - but this does not seem like an
efficient solution - especially if there are several hundred thousand files in
a directory.
What design pattern(s) might be better to consider using?
Merlin Beedell