In the meantime you might want to try using tdbloader/tdbloader2 (http://jena.apache.org/documentation/tdb/commands.html#tdbloader2) to create the TDB dataset offline instead
You can then start up a Fuseki server and connect to the TDB dataset you created Rob On 11/2/12 3:41 PM, "Stephen Allen" <sal...@apache.org> wrote: >Hi Paul, > >Thanks for the report. This is a known issue in Fuseki (see JENA-309 >[1]). I have plans to work on this soon. Also I'm a little surprised >that your second attempt after breaking it into chunks failed, I'll >take a look at that. > >I am also working on a related issue (JENA-330 [2]) that will >eliminate limits on SPARQL Update queries. I hope to have that >checked into the trunk soon. > >-Stephen > >[1] https://issues.apache.org/jira/browse/JENA-309 >[2] https://issues.apache.org/jira/browse/JENA-330 > > > >On Fri, Nov 2, 2012 at 5:24 PM, Paul Gearon <gea...@ieee.org> wrote: >> This is probably pushing Jena beyond it's design limits, but I thought >>I'd >> report on it anyway. >> >> I needed to test some things with large data sets, so I tried to load >>the >> data from http://basekb.com/ >> >> Once extracted from the tar.gz file, it creates a directory called >>baseKB >> filled with 1024 gzipped nt files. >> >> On my first attempt, I grabbed a fresh copy of Fuseki 0.2.5 and started >>it >> with TDB storage. I didn't want to individually load 1024 files from the >> control panel, so I used zcat to dump everything into one file and tried >> loading from the GUI. This failed in short order with RIOT complaining >>of >> memory: >> >> 13:24:31 WARN Fuseki :: [1] RC = 500 : Java heap space >> java.lang.OutOfMemoryError: Java heap space >> at java.util.Arrays.copyOfRange(Arrays.java:2694) >> at java.lang.String.<init>(String.java:234) >> at java.lang.StringBuilder.toString(StringBuilder.java:405) >> at >>org.openjena.riot.tokens.TokenizerText.readIRI(TokenizerText.java:476) >> ...etc... >> >> I'm wondering if RIOT really needed to run out of memory? >> >> Anyway, I went back to the individual files. That meant using a non-gui >> approach. I wasn't sure about using a media type for nt, but that's >> compatible with Turtle, so I used test/turtle. >> >> I threw away the DB directory and started again. This time I tried to >>load >> the files with the following bash: >> >> for i in *.nt.gz; do >> echo "Loading $i" >> zcat $i | curl -x POST -H "Content-Type: text/turtle" --upload-file - >>" >> http://localhost:3030/dataset/data?default" >> done >> >> This started reasonably well. A number of warnings showed up on the >>server >> side, due to bad language tags and invalid IRIs, but it kept going. >> However, on the 20th file I started seeing these: >> Loading triples0000.nt.gz >> Loading triples0001.nt.gz >> Loading triples0002.nt.gz >> Loading triples0003.nt.gz >> Loading triples0004.nt.gz >> Loading triples0005.nt.gz >> Loading triples0006.nt.gz >> Loading triples0007.nt.gz >> Loading triples0008.nt.gz >> Loading triples0009.nt.gz >> Loading triples0010.nt.gz >> Loading triples0011.nt.gz >> Loading triples0012.nt.gz >> Loading triples0013.nt.gz >> Loading triples0014.nt.gz >> Loading triples0015.nt.gz >> Loading triples0016.nt.gz >> Loading triples0017.nt.gz >> Loading triples0018.nt.gz >> Loading triples0019.nt.gz >> Error 500: GC overhead limit exceeded >> >> >> Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100) >> Loading triples0020.nt.gz >> Error 500: GC overhead limit exceeded >> >> >> Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100) >> Loading triples0021.nt.gz >> Error 500: GC overhead limit exceeded >> >> >> Fuseki - version 0.2.5 (Build date: 2012-10-20T17:03:29+0100) >> >> This kept going until triples0042.nt.gz where it hung for hours. >> >> Meanwhile, on the server, I was still seeing parser warnings, but also >> messages like: >> 17:01:26 WARN SPARQL_REST$HttpActionREST :: Transaction still active in >> endWriter - no commit or abort seen (forced abort) >> 17:01:26 WARN Fuseki :: [33] RC = 500 : GC overhead limit >> exceeded >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> >> When I finally killed it (with ctrl-C), I got several stack traces in >>the >> stdout log. They appeared to indicate a bad state, so I've saved them >>and >> put them up at: http://pastebin.com/yar5Pq85 >> >> While OOM is very hard to deal with, I'm still surprised to see it hit >>this >> way, so I thought you might be interested to see it. >> >> Regards, >> Paul Gearon