there can be many reasons why read takes a long time. The size of 400K does mean that the file is small. How large is the grid, I mean number of rows and columns? ALso, .xlsx file is a zipped archive of xml and 400K compressed can grow x10 or even more uncompressed.
POI uses XmlBeans framework to work with XML and it can be the bottleneck. Using this framework makes life much easier but the price is memory and CPU usage. To tell for sure where the bottleneck is, can you profile your application and reports the results. If you think it is bug or have a suggestion how to improve performance, please create a new bug in Bugzilla and attach sample file and Java code that demonstrates what's wrong (slow read) and what can be done to improve it. Regards, Yegor On Fri, Aug 24, 2012 at 3:34 PM, Lakshmi Ramakrishnan <[email protected]> wrote: > I see. So, what is the intuition behind the slow startup for such a small > file? Is it due to the complex inner-references? > > Even if so, why isn't the object creation eating up all available resources > (memory/cpu) in the process and make it happen ASAP? Is it due to the I/o > wait time now? > > So, given all these limitations, is there an option out there that I'm not > looking at? > > Appreciate your comments! > > > On Aug 24, 2012, at 6:55 AM, Yegor Kozlov <[email protected]> wrote: > >> It is a limitation of POI: you have to re-read the workbook object >> after you call workbook.write(out). >> There is a plan to fix it in future version of POI, but for now you >> have to re-read to avoid XmlValueDisconnectedException >> >> Yegor >> >> On Thu, Aug 23, 2012 at 11:19 PM, lramakri >> <[email protected]> wrote: >>> Hi, >>> >>> We have an excel file that is pre-created with fancy charts and formulae >>> that cross-reference sheets. It also contains macros. Source data is >>> supposed to be copy-pasted into one sheet and the rest of workbook populates >>> itself from pre-existing wiring. >>> >>> Note: I really don't care about reading / modifying other sheets in this >>> workbook, I just need to save the pains of copy-pasting raw data every time >>> to this data sheet in this workbook. >>> >>> I'm using Apache POI and trying to create a XSSFWorkbook instance from this >>> "template" excel file. However, it takes a long time (almost a minute >>> consistently across many runs) for this object to be constructed. The excel >>> file itself is just around 400KB, not a big file. I profiled this using >>> jconsole, it seems like it is not limited by either CPU or Heap - It uses >>> just around 90 MB heap memory (I had started it up with 2GB committed heap) >>> and around 52% CPU. >>> >>> It takes a very short time to actually populate the data sheet with raw data >>> and write the final updated file out (roughly 3-4 seconds). Here is my >>> startup code: >>> >>> public static void startup() throws FileNotFoundException, IOException { >>> long start = System.nanoTime(); >>> System.out.println("Started..."); >>> TEMPLATE_WORKBOOK = new XSSFWorkbook(new FileInputStream(new >>> File(TEMPLATE))); >>> long end = (System.nanoTime() - start) / NANOS; >>> System.out.println("It took " + end + " seconds.."); >>> } >>> >>> I thought about loading up this TEMPLATE_WORKBOOK once and then reuse the >>> same handle to write new data for every subsequent request - I simulated >>> this with a sleep and a forever-while in my main class. But I can't do this >>> apparently, I got an exception "Exception in thread "main" >>> org.apache.xmlbeans.impl.values.XmlValueDisconnectedException". The >>> TEMPLATE_WORKBOOK object is not re-usable. >>> >>> I do see that there is an event-based API, but before I get into it, I >>> wanted to see if I'm missing something here! Again, memory / CPU is not an >>> issue here, we have plenty of heap to spare. I'm trying to reduce time. >>> >>> Any tips / insights would be deeply appreciated! >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-poi.1045710.n5.nabble.com/Construction-of-XSSFWorkbook-very-slow-for-xlsm-workbooks-tp5710769.html >>> Sent from the POI - User mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
