They have a version that supports streams to handle larger files. Its just
not the free version.


On Tue, Feb 11, 2014 at 11:59 PM, Till Westmann <[email protected]> wrote:

> Hi Preston,
>
> do you have indications that this is a limitation of just the free version?
> I think that it wouldn't be completely surprising to see a big memory blow
> up.
> Assuming that the XML file is in single-byte UTF-8 (which I think it is)
> and that the text is stored in 2-byte UTF-16 characters in the JVM, we
> already have a factor of 2. And then there are probably a number of objects
> and references that take up additional memory. So it might be that all
> versions of Saxon take up a lot of space in memory. But of course it is
> also possible that the commercial version uses a more memory efficient
> representation.
>
> Cheers,
> Till
>
> On Feb 11, 2014, at 8:07 PM, Eldon Carman <[email protected]> wrote:
>
> > In testing larger datasets sizes, saxon has run into a memory
> limitation. A
> > data set size of 2.21 GB was not able to be queried by saxon. Even with
> > setting the java heap size be larger than the data set, the application
> > throws an error: "Exception in thread "main" java.lang.OutOfMemoryError:
> GC
> > overhead limit exceeded". Just to confirm, I used the following settings:
> > JAVA_OPTS="-Xmx12g -Xms12g"
> >
> > Several internet posts comment on allocating 5 times as much memory as
> the
> > xml data size as a rule of thumb. Not guaranteed to work. Some of my
> > testing have worked with datasets up to 460MB (happens to the be the my
> > tiny dataset size). Guess we now have confirmed the memory limitation of
> > the free version of saxon.
>
>

Reply via email to