They have a version that supports streams to handle larger files. Its just not the free version.
On Tue, Feb 11, 2014 at 11:59 PM, Till Westmann <[email protected]> wrote: > Hi Preston, > > do you have indications that this is a limitation of just the free version? > I think that it wouldn't be completely surprising to see a big memory blow > up. > Assuming that the XML file is in single-byte UTF-8 (which I think it is) > and that the text is stored in 2-byte UTF-16 characters in the JVM, we > already have a factor of 2. And then there are probably a number of objects > and references that take up additional memory. So it might be that all > versions of Saxon take up a lot of space in memory. But of course it is > also possible that the commercial version uses a more memory efficient > representation. > > Cheers, > Till > > On Feb 11, 2014, at 8:07 PM, Eldon Carman <[email protected]> wrote: > > > In testing larger datasets sizes, saxon has run into a memory > limitation. A > > data set size of 2.21 GB was not able to be queried by saxon. Even with > > setting the java heap size be larger than the data set, the application > > throws an error: "Exception in thread "main" java.lang.OutOfMemoryError: > GC > > overhead limit exceeded". Just to confirm, I used the following settings: > > JAVA_OPTS="-Xmx12g -Xms12g" > > > > Several internet posts comment on allocating 5 times as much memory as > the > > xml data size as a rule of thumb. Not guaranteed to work. Some of my > > testing have worked with datasets up to 460MB (happens to the be the my > > tiny dataset size). Guess we now have confirmed the memory limitation of > > the free version of saxon. > >
