Right, I forgot about that. Thanks, Till
On Feb 12, 2014, at 12:23 PM, Eldon Carman <[email protected]> wrote: > They have a version that supports streams to handle larger files. Its just > not the free version. > > > On Tue, Feb 11, 2014 at 11:59 PM, Till Westmann <[email protected]> wrote: > >> Hi Preston, >> >> do you have indications that this is a limitation of just the free version? >> I think that it wouldn't be completely surprising to see a big memory blow >> up. >> Assuming that the XML file is in single-byte UTF-8 (which I think it is) >> and that the text is stored in 2-byte UTF-16 characters in the JVM, we >> already have a factor of 2. And then there are probably a number of objects >> and references that take up additional memory. So it might be that all >> versions of Saxon take up a lot of space in memory. But of course it is >> also possible that the commercial version uses a more memory efficient >> representation. >> >> Cheers, >> Till >> >> On Feb 11, 2014, at 8:07 PM, Eldon Carman <[email protected]> wrote: >> >>> In testing larger datasets sizes, saxon has run into a memory >> limitation. A >>> data set size of 2.21 GB was not able to be queried by saxon. Even with >>> setting the java heap size be larger than the data set, the application >>> throws an error: "Exception in thread "main" java.lang.OutOfMemoryError: >> GC >>> overhead limit exceeded". Just to confirm, I used the following settings: >>> JAVA_OPTS="-Xmx12g -Xms12g" >>> >>> Several internet posts comment on allocating 5 times as much memory as >> the >>> xml data size as a rule of thumb. Not guaranteed to work. Some of my >>> testing have worked with datasets up to 460MB (happens to the be the my >>> tiny dataset size). Guess we now have confirmed the memory limitation of >>> the free version of saxon. >> >>
