Re: HSSF: Middle-ground API for reading an Excel spreadsheet

Nick Burch Thu, 31 Jan 2008 09:41:02 -0800

On Tue, 29 Jan 2008, Daniel Noll wrote:

Is your formula related eventusermodel code in a format suitable forcontributing back? It'd be handy to be able to put something in svnthat would make dealing with the formula stuff much simpler. I'd behappy to spend a bit of time tidying it up / writing tests for it, ifyou could contribute it?
If I ever figure out how to handle it, I probably would contribute itback because it would mean changes to how shared formulas work. At themoment as you say, it does require a Workbook. At the moment I don'thave a Workbook to work with. Maybe I can store off the first howevermany records and then create the Workbook from those -- I haven't triedso I don't know what happens if you feed in a list of records withoutthe ones which make up the read of the file.

I think you might be able to get away with that. If not, shout and we cantweak things.

If it gets you close, then we should probably come up with something likea WorkbookRecordSource interface, which model.Workbook implements. Tweakthe formula code to use those instead, then it's easier for you to pass inthe records that mater. Let us know if that looks like being worth doing.

Memory is indeed cheap, but unless you have the luxury of a 64-bit JVM,there is an upper limit of somewhere around 1.4GB, sometimes less.This would normally be nearly 2GB but Windows allocates some DLLs inweird positions on some systems, and Sun insist on allocating acontiguous block of memory for the heap which sometimes causes a hugeunusable memory hole above that.

Have you tried tweaking your windows box to use a 1gb/3gb split, insteadof the usual 2gb/2gb one? Might help out in the absence of a 64 bit jvm /a licence for a non-hobbled 32 bit version of windows.

http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx

In actual fact for us, something closer to RecordInputStream would beeven better, where we can just say nextRecord() and have it return aproperly constructed Record. Then we have control over the loop, whichis ideal when you need to return a Reader.

Does the newly added org.apache.poi.hssf.eventusermodel.HSSFRecordStreamlook roughly like what you need? I've converted the existingeventusermodel code to use it under the hood, so it ought to behavepretty much the same, except with pull instead of push.

As far as the records keeping a copy, could they not instead keep anoffset and a reference to the original buffer? Then if someone calls asetter, it would need to create a new buffer, set the offset to 0 andcopy the data before doing the actual set.

In many cases, they only keep the parsed data in memory, and not thesource bytes. That's certainly one of the advantages of the (not so) newRecordInputStream method

And as far as POIFS keeping a copy, yes... POIFS is full of issues likethat. For instance, even if all you need to read is the CLSID, you stillhave to read the entire file. If POIFSFileSystem could construct from aByteBuffer and not take unnecessary copies, it could speed things updramatically for that situation... but ultimately that would need topropagate to the whole framework for it to really show benefits.


Do feel free to submit patches for that sort of thing :)

I haven't played with ByteBuffer before, so do feel free to suggest how itmight help + point at code examples / patches that show it


Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: HSSF: Middle-ground API for reading an Excel spreadsheet

Reply via email to