Hi Preston, do you have indications that this is a limitation of just the free version? I think that it wouldn't be completely surprising to see a big memory blow up. Assuming that the XML file is in single-byte UTF-8 (which I think it is) and that the text is stored in 2-byte UTF-16 characters in the JVM, we already have a factor of 2. And then there are probably a number of objects and references that take up additional memory. So it might be that all versions of Saxon take up a lot of space in memory. But of course it is also possible that the commercial version uses a more memory efficient representation.
Cheers, Till On Feb 11, 2014, at 8:07 PM, Eldon Carman <[email protected]> wrote: > In testing larger datasets sizes, saxon has run into a memory limitation. A > data set size of 2.21 GB was not able to be queried by saxon. Even with > setting the java heap size be larger than the data set, the application > throws an error: "Exception in thread "main" java.lang.OutOfMemoryError: GC > overhead limit exceeded". Just to confirm, I used the following settings: > JAVA_OPTS="-Xmx12g -Xms12g" > > Several internet posts comment on allocating 5 times as much memory as the > xml data size as a rule of thumb. Not guaranteed to work. Some of my > testing have worked with datasets up to 460MB (happens to the be the my > tiny dataset size). Guess we now have confirmed the memory limitation of > the free version of saxon.
