Wabi,

just guessing:

XSSFWorkbook workbook = new XSSFWorkbook(new BufferedInputStream(new 
FileInputStream("src/main/resources/customer.xlsx")));

You operate with exactly ONE STATIC FILE and repeat that 10 times.
I would not be surprised, when a recent JVM detects this and runs it only 1 
time -- eliminating 9 times the same code. 
JVM code elimination is incredible smart these days. I'd expect that the 
Workbook object stays in cache and is re-used 9 times.

This applies to the Serial Test Case. The Parallel Test Case runs everything at 
the same time, so can't eliminate.
Of course, this is just a theory and needs proof: 

a) use randomly generated sheets instead of 1 static sheet for your tests
b) forcefully destroy/finalise the Worksheet Object by tampering with the GC 
settings 
c) pre-warm the JVM before running your tests (so that also the Parallel Test 
Case has all the Cached Objects available)

d) better engage a proper Micro Testing Framework (like Java Microbenchmark 
Harness "JMH"), taking care of those considerations

Good luck!
Andreas




    


On Fri, 2022-05-20 at 11:15 -0400, Wabi Sabi wrote:
> Thank you for taking a look! It is indeed.
> 
> I also tested the same logic on Win JDK 11 and Mac OS X JDK 1.8. The
> overall pattern is the same: the initial run is super slow (7 seconds
> on
> Mac and 1.5 seconds on Win), subsequent runs are dramatically better
> (down
> to 150-200 ms on both systems).
> 
> On Fri, May 20, 2022 at 10:47 AM PJ Fanning
> <[email protected]>
> wrote:
> 
> > Is this related to
> > https://stackoverflow.com/questions/72310943/poi-single-vs-multithreaded-performance
> > ?
> > 
> > 
> > 
> > 
> > 
> > 
> > On Friday 20 May 2022, 15:41:28 IST, Wabi Sabi
> > <[email protected]>
> > wrote:
> > 
> > 
> > 
> > 
> > 
> > Hello,
> > 
> > I am trying to parallelize Excel processing and I am noticing a
> > bizarre behavior - single threaded processing is actually faster...
> > 
> > I am not doing anything fancy. I just open an XSSFWorkbook, fill
> > out some
> > values, run formula calcs and read the output. If I run single
> > threaded -
> > initial run takes a few seconds to complete (assume because JVM
> > needs to
> > load POI + all the XML, schemas, etc.), but performance improves
> > and subsequent runs all take about 100-200 ms.
> > 
> > Same logic executed in a separate thread runs easily for 5 seconds
> > in each
> > thread.... So turns out that single threaded processing of say 10
> > files is
> > at 4.5 seconds, but multithreaded takes 5-6 easily... No files are
> > shared
> > among threads.
> > 
> > The hotspots are in POIXMLDocument.load. Thread behavior also looks
> > correct. File contention is out of the picture too - reading a
> > different
> > file each time.
> > 
> > Any ideas as two why or pointers at the POI multithreading best
> > practices
> > are greatly appreciated.  Thank you very much in advance!
> > 
> > -------------------------------------------------------------------
> > --
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> > 
> > 

Reply via email to