That doesn't sound so much as a jena issue but simply not allocating enough 
memory for the JVM to run a java application.   You can increase the amount of 
memory available to the application simply by adding the -XmxNNNNM option when 
starting up fuseki (where NNNN is the amount of memory to use in Megabytes).  I 
start it up on my laptop (which doesn't have a huge amount of memory) using a 
bash script that looks something like this:

#!/bin/sh
port=3030
java -cp 
./fuseki-server.jar:lib:lib/sdb-1.3.4.jar:lib/mysql-connector-java-5.1.16-bi
n.jar:lib/arq-2.8.8.jar  -Xmx1024M org.apache.jena.fuseki.FusekiCmd --desc 
fuseki.ttl --port=$port /ds > fuseki.log 2>&
1 &

Note that I'm telling the JVM to use 1024M of memory.  In order to load very 
large datasets you may need a machine with a lot of memory and then you can 
increase the memory allocation  as necessary.  

> -----Original Message-----
> From: gea...@gmail.com [mailto:gea...@gmail.com] On Behalf Of Paul
> Gearon
> Sent: Friday, November 02, 2012 5:24 PM
> To: jena-us...@incubator.apache.org
> Subject: large load errors
> 
> This is probably pushing Jena beyond it's design limits, but I thought
> I'd report on it anyway.
> 
> I needed to test some things with large data sets, so I tried to load
> the data from http://basekb.com/
> 
> Once extracted from the tar.gz file, it creates a directory called
> baseKB filled with 1024 gzipped nt files.
> 
> On my first attempt, I grabbed a fresh copy of Fuseki 0.2.5 and started
> it with TDB storage. I didn't want to individually load 1024 files from
> the control panel, so I used zcat to dump everything into one file and
> tried loading from the GUI. This failed in short order with RIOT
> complaining of
> memory:
> 
> 13:24:31 WARN  Fuseki               :: [1] RC = 500 : Java heap space
> java.lang.OutOfMemoryError: Java heap space at
> java.util.Arrays.copyOfRange(Arrays.java:2694)
> at java.lang.String.<init>(String.java:234)
> at java.lang.StringBuilder.toString(StringBuilder.java:405)
> at
> org.openjena.riot.tokens.TokenizerText.readIRI(TokenizerText.java:476)
> ...etc...
> 
> I'm wondering if RIOT really needed to run out of memory?
> 
> Anyway, I went back to the individual files. That meant using a non-gui
> approach. I wasn't sure about using a media type for nt, but that's
> compatible with Turtle, so I used test/turtle.
> 
> I threw away the DB directory and started again. This time I tried to
> load the files with the following bash:
> 
> for i in *.nt.gz; do
>   echo "Loading $i"
>   zcat $i | curl -x POST -H "Content-Type: text/turtle" --upload-file -
> "
> http://localhost:3030/dataset/data?default";
> done
> 
> This started reasonably well. A number of warnings showed up on the
> server side, due to bad language tags and invalid IRIs, but it kept
> going.
> However, on the 20th file I started seeing these:
> Loading triples0000.nt.gz
> Loading triples0001.nt.gz
> Loading triples0002.nt.gz
> Loading triples0003.nt.gz
> Loading triples0004.nt.gz
> Loading triples0005.nt.gz
> Loading triples0006.nt.gz
> Loading triples0007.nt.gz
> Loading triples0008.nt.gz
> Loading triples0009.nt.gz
> Loading triples0010.nt.gz
> Loading triples0011.nt.gz
> Loading triples0012.nt.gz
> Loading triples0013.nt.gz
> Loading triples0014.nt.gz
> Loading triples0015.nt.gz
> Loading triples0016.nt.gz
> Loading triples0017.nt.gz
> Loading triples0018.nt.gz
> Loading triples0019.nt.gz
> Error 500: GC overhead limit exceeded
> 
> 
> Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100) Loading
> triples0020.nt.gz Error 500: GC overhead limit exceeded
> 
> 
> Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100) Loading
> triples0021.nt.gz Error 500: GC overhead limit exceeded
> 
> 
> Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100)
> 
> This kept going until triples0042.nt.gz where it hung for hours.
> 
> Meanwhile, on the server, I was still seeing parser warnings, but also
> messages like:
> 17:01:26 WARN  SPARQL_REST$HttpActionREST :: Transaction still active
> in endWriter - no commit or abort seen (forced abort)
> 17:01:26 WARN  Fuseki               :: [33] RC = 500 : GC overhead
> limit
> exceeded
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 
> When I finally killed it (with ctrl-C), I got several stack traces in
> the stdout log. They appeared to indicate a bad state, so I've saved
> them and put them up at:  http://pastebin.com/yar5Pq85
> 
> While OOM is very hard to deal with, I'm still surprised to see it hit
> this way, so I thought you might be interested to see it.
> 
> Regards,
> Paul Gearon

Reply via email to