One more question about this: We're querying a collection of documents, right? 
So if Saxon run out of memory, does that mean that it first loads all the 
documents in the collection into memory and keeps them there?

Thanks,
Till

On Feb 12, 2014, at 9:31 PM, Till Westmann <[email protected]> wrote:

> Right, I forgot about that.
> 
> Thanks,
> Till
> 
> On Feb 12, 2014, at 12:23 PM, Eldon Carman <[email protected]> wrote:
> 
>> They have a version that supports streams to handle larger files. Its just
>> not the free version.
>> 
>> 
>> On Tue, Feb 11, 2014 at 11:59 PM, Till Westmann <[email protected]> wrote:
>> 
>>> Hi Preston,
>>> 
>>> do you have indications that this is a limitation of just the free version?
>>> I think that it wouldn't be completely surprising to see a big memory blow
>>> up.
>>> Assuming that the XML file is in single-byte UTF-8 (which I think it is)
>>> and that the text is stored in 2-byte UTF-16 characters in the JVM, we
>>> already have a factor of 2. And then there are probably a number of objects
>>> and references that take up additional memory. So it might be that all
>>> versions of Saxon take up a lot of space in memory. But of course it is
>>> also possible that the commercial version uses a more memory efficient
>>> representation.
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Feb 11, 2014, at 8:07 PM, Eldon Carman <[email protected]> wrote:
>>> 
>>>> In testing larger datasets sizes, saxon has run into a memory
>>> limitation. A
>>>> data set size of 2.21 GB was not able to be queried by saxon. Even with
>>>> setting the java heap size be larger than the data set, the application
>>>> throws an error: "Exception in thread "main" java.lang.OutOfMemoryError:
>>> GC
>>>> overhead limit exceeded". Just to confirm, I used the following settings:
>>>> JAVA_OPTS="-Xmx12g -Xms12g"
>>>> 
>>>> Several internet posts comment on allocating 5 times as much memory as
>>> the
>>>> xml data size as a rule of thumb. Not guaranteed to work. Some of my
>>>> testing have worked with datasets up to 460MB (happens to the be the my
>>>> tiny dataset size). Guess we now have confirmed the memory limitation of
>>>> the free version of saxon.
>>> 
>>> 
> 

Reply via email to