Right, I forgot about that.

Thanks,
Till

On Feb 12, 2014, at 12:23 PM, Eldon Carman <[email protected]> wrote:

> They have a version that supports streams to handle larger files. Its just
> not the free version.
> 
> 
> On Tue, Feb 11, 2014 at 11:59 PM, Till Westmann <[email protected]> wrote:
> 
>> Hi Preston,
>> 
>> do you have indications that this is a limitation of just the free version?
>> I think that it wouldn't be completely surprising to see a big memory blow
>> up.
>> Assuming that the XML file is in single-byte UTF-8 (which I think it is)
>> and that the text is stored in 2-byte UTF-16 characters in the JVM, we
>> already have a factor of 2. And then there are probably a number of objects
>> and references that take up additional memory. So it might be that all
>> versions of Saxon take up a lot of space in memory. But of course it is
>> also possible that the commercial version uses a more memory efficient
>> representation.
>> 
>> Cheers,
>> Till
>> 
>> On Feb 11, 2014, at 8:07 PM, Eldon Carman <[email protected]> wrote:
>> 
>>> In testing larger datasets sizes, saxon has run into a memory
>> limitation. A
>>> data set size of 2.21 GB was not able to be queried by saxon. Even with
>>> setting the java heap size be larger than the data set, the application
>>> throws an error: "Exception in thread "main" java.lang.OutOfMemoryError:
>> GC
>>> overhead limit exceeded". Just to confirm, I used the following settings:
>>> JAVA_OPTS="-Xmx12g -Xms12g"
>>> 
>>> Several internet posts comment on allocating 5 times as much memory as
>> the
>>> xml data size as a rule of thumb. Not guaranteed to work. Some of my
>>> testing have worked with datasets up to 460MB (happens to the be the my
>>> tiny dataset size). Guess we now have confirmed the memory limitation of
>>> the free version of saxon.
>> 
>> 

Reply via email to