That doesn't sound so much as a jena issue but simply not allocating enough memory for the JVM to run a java application. You can increase the amount of memory available to the application simply by adding the -XmxNNNNM option when starting up fuseki (where NNNN is the amount of memory to use in Megabytes). I start it up on my laptop (which doesn't have a huge amount of memory) using a bash script that looks something like this:
#!/bin/sh port=3030 java -cp ./fuseki-server.jar:lib:lib/sdb-1.3.4.jar:lib/mysql-connector-java-5.1.16-bi n.jar:lib/arq-2.8.8.jar -Xmx1024M org.apache.jena.fuseki.FusekiCmd --desc fuseki.ttl --port=$port /ds > fuseki.log 2>& 1 & Note that I'm telling the JVM to use 1024M of memory. In order to load very large datasets you may need a machine with a lot of memory and then you can increase the memory allocation as necessary. > -----Original Message----- > From: gea...@gmail.com [mailto:gea...@gmail.com] On Behalf Of Paul > Gearon > Sent: Friday, November 02, 2012 5:24 PM > To: jena-us...@incubator.apache.org > Subject: large load errors > > This is probably pushing Jena beyond it's design limits, but I thought > I'd report on it anyway. > > I needed to test some things with large data sets, so I tried to load > the data from http://basekb.com/ > > Once extracted from the tar.gz file, it creates a directory called > baseKB filled with 1024 gzipped nt files. > > On my first attempt, I grabbed a fresh copy of Fuseki 0.2.5 and started > it with TDB storage. I didn't want to individually load 1024 files from > the control panel, so I used zcat to dump everything into one file and > tried loading from the GUI. This failed in short order with RIOT > complaining of > memory: > > 13:24:31 WARN Fuseki :: [1] RC = 500 : Java heap space > java.lang.OutOfMemoryError: Java heap space at > java.util.Arrays.copyOfRange(Arrays.java:2694) > at java.lang.String.<init>(String.java:234) > at java.lang.StringBuilder.toString(StringBuilder.java:405) > at > org.openjena.riot.tokens.TokenizerText.readIRI(TokenizerText.java:476) > ...etc... > > I'm wondering if RIOT really needed to run out of memory? > > Anyway, I went back to the individual files. That meant using a non-gui > approach. I wasn't sure about using a media type for nt, but that's > compatible with Turtle, so I used test/turtle. > > I threw away the DB directory and started again. This time I tried to > load the files with the following bash: > > for i in *.nt.gz; do > echo "Loading $i" > zcat $i | curl -x POST -H "Content-Type: text/turtle" --upload-file - > " > http://localhost:3030/dataset/data?default" > done > > This started reasonably well. A number of warnings showed up on the > server side, due to bad language tags and invalid IRIs, but it kept > going. > However, on the 20th file I started seeing these: > Loading triples0000.nt.gz > Loading triples0001.nt.gz > Loading triples0002.nt.gz > Loading triples0003.nt.gz > Loading triples0004.nt.gz > Loading triples0005.nt.gz > Loading triples0006.nt.gz > Loading triples0007.nt.gz > Loading triples0008.nt.gz > Loading triples0009.nt.gz > Loading triples0010.nt.gz > Loading triples0011.nt.gz > Loading triples0012.nt.gz > Loading triples0013.nt.gz > Loading triples0014.nt.gz > Loading triples0015.nt.gz > Loading triples0016.nt.gz > Loading triples0017.nt.gz > Loading triples0018.nt.gz > Loading triples0019.nt.gz > Error 500: GC overhead limit exceeded > > > Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100) Loading > triples0020.nt.gz Error 500: GC overhead limit exceeded > > > Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100) Loading > triples0021.nt.gz Error 500: GC overhead limit exceeded > > > Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100) > > This kept going until triples0042.nt.gz where it hung for hours. > > Meanwhile, on the server, I was still seeing parser warnings, but also > messages like: > 17:01:26 WARN SPARQL_REST$HttpActionREST :: Transaction still active > in endWriter - no commit or abort seen (forced abort) > 17:01:26 WARN Fuseki :: [33] RC = 500 : GC overhead > limit > exceeded > java.lang.OutOfMemoryError: GC overhead limit exceeded > > When I finally killed it (with ctrl-C), I got several stack traces in > the stdout log. They appeared to indicate a bad state, so I've saved > them and put them up at: http://pastebin.com/yar5Pq85 > > While OOM is very hard to deal with, I'm still surprised to see it hit > this way, so I thought you might be interested to see it. > > Regards, > Paul Gearon