Yes, that is understandable. However, in my tests memory usage to parse a file with 55000 rows is 1.5 GB -- isn't that a bit too high? I've tested LibXL with the same file -- memory usage is just 240 MB.
On Tue, Apr 12, 2016 at 2:09 PM, Murphy, Mark <[email protected]> wrote: > XSSF is an XML document. Given that XML is generally about 70-80% overhead > vs. data, it is not surprising that binary spreadsheets (which can be > optimized, and have very little overhead) are more memory efficient. In > addition, XML must be parsed, but binary documents can frequently be > accessed using pointers and data structures. That gives the binary formats > a performance edge, which can be significant. I'm not sure how Microsoft > handles spreadsheets internally, but maybe they keep an internal binary > format, and then write it to whatever format is requested on save rather > than using an internal XML representation for an XML spreadsheet, which I > what POI is doing. > > -----Original Message----- > From: Jack of Shadows [mailto:[email protected]] > Sent: Monday, April 11, 2016 7:46 AM > To: POI Users List > Subject: Re: SSPerformanceTest: Is the FAQ still accurate? > > XSSF is basically unusable. 25000 or 50000 isn't that many rows. Memory > consumption is pretty high too. > That's really confusing, I wouldn't have been surprised if HSSF performed > poorly -- but it actually works better. > Ohh well, whatever, I guess I'd have to use SXSSF instead. > > On Mon, Apr 11, 2016 at 12:04 AM, Dominik Stadler <[email protected]> > wrote: > > > Hi, > > > > Not sure which exact machine spec the information in the FAQ is based > > on, maybe there is something that can have quite a big influence on > > runtime of this sample for XSSF, e.g. which actual JDK is used, > Linux/Windows, ... ?! > > > > I did a quick run of it across various versions of POI to see if we > > degraded performance at some point, but for me it rather was always > > this way, i.e. HSSF very quick, SXSSF fairly quick (with being very > > slow in early releases) and XSSF quite a bit slower, maybe we need to > > adjust the FAQ entry some more here to set correct expectations? > > > > (Exact numbers here are not that relevant as I used my 6+ year old > > laptop where I was doing other things at the same time, albeit no CPU > > intensive things, JVM was Sun 6.0, Linux Ubuntu, 25000 rows, 25 cols) > > > > > > latest-2016-04-10: > > > > Elapsed 2 seconds > > Elapsed 15 seconds > > Elapsed 5 seconds > > > > > > 2014-03-22 (the FAQ-Entry was added) > > > > Elapsed 1 seconds > > Elapsed 14 seconds > > Elapsed 3 seconds > > > > > > 3.10: > > > > Elapsed 2 seconds > > Elapsed 14 seconds > > Elapsed 3 seconds > > > > > > 3.9: > > > > Elapsed 1 seconds > > Elapsed 12 seconds > > Elapsed 3 seconds > > > > > > 3.8: > > > > Elapsed 2 seconds > > Elapsed 15 seconds > > Elapsed 3 seconds > > > > > > initial checkin of SSPerformanceTest: > > > > Elapsed 1 seconds > > Elapsed 14 seconds > > Elapsed 47 seconds > > > > > > Dominik. > > > > > > > > > > On Sun, Apr 10, 2016 at 5:59 PM, Jack <[email protected]> wrote: > > > > > I'm having the exact same issue, I've tracked down this message from > > > StackOverflow. > > > I've tested read performance on two XLS and XLSX with identical > > > content (around 75000 rows, 25 columns). > > > HSSF takes under 5 sec; XSSF takes 15-20 sec. > > > > > > Any idea what is the issue with XSSF performance? > > > > > > > > > On 15.02.2016 17:00, Drew Spencer wrote: > > > > > >> Mike DeHaan <mike <at> mikeandzoya.com> writes: > > >> > > >> As a followup, a user has replied to my stack overflow post with > > >> some > > >>> information that might be helpful in tracking this issue down. > > >>> Here is > > >>> > > >> the > > >> > > >>> link to his post: > > >>> > > >>> http://stackoverflow.com/a/34266795/4471563 > > >>> > > >>> I ran the same tests in my environments and came up with similar > > >>> > > >> numbers. > > >> > > >>> -Mike DeHaan > > >>> > > >>> I have also asked the same question. Would love to get an answer > > >>> to > > this > > >> either way. My similar post on StackOverflow is here: > > >> http://stackoverflow.com/questions/34995058/apache-poi-much-quicker > > >> - > > >> using-hssf-than-xssf-what-next > > >> > > >> I received an good answer with the link to the streaming reader, > > >> but unfortunately I don't think I can use it because my code runs > > >> on app engine. > > >> > > >> Thanks to anyone that can help. > > >> > > >> Drew Spencer > > >> > > >> > > >> ------------------------------------------------------------------- > > >> -- To unsubscribe, e-mail: [email protected] For > > >> additional commands, e-mail: [email protected] > > >> > > >> > > >> > > > > > > -------------------------------------------------------------------- > > > - To unsubscribe, e-mail: [email protected] For > > > additional commands, e-mail: [email protected] > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
