Re: large load errors

Rob Vesse Fri, 02 Nov 2012 16:46:10 -0700

In the meantime you might want to try using tdbloader/tdbloader2
(http://jena.apache.org/documentation/tdb/commands.html#tdbloader2) to
create the TDB dataset offline instead



You can then start up a Fuseki server and connect to the TDB dataset you
created

Rob

On 11/2/12 3:41 PM, "Stephen Allen" <sal...@apache.org> wrote:

>Hi Paul,
>
>Thanks for the report.  This is a known issue in Fuseki (see JENA-309
>[1]).  I have plans to work on this soon.  Also I'm a little surprised
>that your second attempt after breaking it into chunks failed, I'll
>take a look at that.
>
>I am also working on a related issue (JENA-330 [2]) that will
>eliminate limits on SPARQL Update queries.  I hope to have that
>checked into the trunk soon.
>
>-Stephen
>
>[1] https://issues.apache.org/jira/browse/JENA-309
>[2] https://issues.apache.org/jira/browse/JENA-330
>
>
>
>On Fri, Nov 2, 2012 at 5:24 PM, Paul Gearon <gea...@ieee.org> wrote:
>> This is probably pushing Jena beyond it's design limits, but I thought
>>I'd
>> report on it anyway.
>>
>> I needed to test some things with large data sets, so I tried to load
>>the
>> data from http://basekb.com/
>>
>> Once extracted from the tar.gz file, it creates a directory called
>>baseKB
>> filled with 1024 gzipped nt files.
>>
>> On my first attempt, I grabbed a fresh copy of Fuseki 0.2.5 and started
>>it
>> with TDB storage. I didn't want to individually load 1024 files from the
>> control panel, so I used zcat to dump everything into one file and tried
>> loading from the GUI. This failed in short order with RIOT complaining
>>of
>> memory:
>>
>> 13:24:31 WARN  Fuseki               :: [1] RC = 500 : Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> at java.util.Arrays.copyOfRange(Arrays.java:2694)
>> at java.lang.String.<init>(String.java:234)
>> at java.lang.StringBuilder.toString(StringBuilder.java:405)
>> at 
>>org.openjena.riot.tokens.TokenizerText.readIRI(TokenizerText.java:476)
>> ...etc...
>>
>> I'm wondering if RIOT really needed to run out of memory?
>>
>> Anyway, I went back to the individual files. That meant using a non-gui
>> approach. I wasn't sure about using a media type for nt, but that's
>> compatible with Turtle, so I used test/turtle.
>>
>> I threw away the DB directory and started again. This time I tried to
>>load
>> the files with the following bash:
>>
>> for i in *.nt.gz; do
>>   echo "Loading $i"
>>   zcat $i | curl -x POST -H "Content-Type: text/turtle" --upload-file -
>>"
>> http://localhost:3030/dataset/data?default";
>> done
>>
>> This started reasonably well. A number of warnings showed up on the
>>server
>> side, due to bad language tags and invalid IRIs, but it kept going.
>> However, on the 20th file I started seeing these:
>> Loading triples0000.nt.gz
>> Loading triples0001.nt.gz
>> Loading triples0002.nt.gz
>> Loading triples0003.nt.gz
>> Loading triples0004.nt.gz
>> Loading triples0005.nt.gz
>> Loading triples0006.nt.gz
>> Loading triples0007.nt.gz
>> Loading triples0008.nt.gz
>> Loading triples0009.nt.gz
>> Loading triples0010.nt.gz
>> Loading triples0011.nt.gz
>> Loading triples0012.nt.gz
>> Loading triples0013.nt.gz
>> Loading triples0014.nt.gz
>> Loading triples0015.nt.gz
>> Loading triples0016.nt.gz
>> Loading triples0017.nt.gz
>> Loading triples0018.nt.gz
>> Loading triples0019.nt.gz
>> Error 500: GC overhead limit exceeded
>>
>>
>> Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100)
>> Loading triples0020.nt.gz
>> Error 500: GC overhead limit exceeded
>>
>>
>> Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100)
>> Loading triples0021.nt.gz
>> Error 500: GC overhead limit exceeded
>>
>>
>> Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100)
>>
>> This kept going until triples0042.nt.gz where it hung for hours.
>>
>> Meanwhile, on the server, I was still seeing parser warnings, but also
>> messages like:
>> 17:01:26 WARN  SPARQL_REST$HttpActionREST :: Transaction still active in
>> endWriter - no commit or abort seen (forced abort)
>> 17:01:26 WARN  Fuseki               :: [33] RC = 500 : GC overhead limit
>> exceeded
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> When I finally killed it (with ctrl-C), I got several stack traces in
>>the
>> stdout log. They appeared to indicate a bad state, so I've saved them
>>and
>> put them up at:  http://pastebin.com/yar5Pq85
>>
>> While OOM is very hard to deal with, I'm still surprised to see it hit
>>this
>> way, so I thought you might be interested to see it.
>>
>> Regards,
>> Paul Gearon

Re: large load errors

Reply via email to