Hello Andy,

On Thu, Jun 14, 2012 at 01:12:25PM +0100, Andy Seaborne wrote:
> >I guess it would be a good idea to look at the end of the dump and check 
> >the
> >corresponding named graph for bad datetimes ?
> 
> Yes - my best guess at the moment is that a dateTime can get in (they 
> are encoded into 56 bits, not recorded using the lexical form) but there 
> was a problem on the recreation of the lexical form.  Whether the 
> encoding or decoding is wrong, I can't tell.

I was not able to find the named graph causing the problem so I recreated the
TDB with tdbloader2 from apache-jena-2.7.2 and tried tdbdump from
apache-jena-2.7.2 immediately after that. The result is that I seem to run
into the same problem:

Exception in thread "main" org.openjena.atlas.AtlasException: formatInt: 
overflow
        at 
org.openjena.atlas.lib.NumberUtils.formatUnsignedInt(NumberUtils.java:115)
        at org.openjena.atlas.lib.NumberUtils.formatInt(NumberUtils.java:87)
        at org.openjena.atlas.lib.NumberUtils.formatInt(NumberUtils.java:60)
        at com.hp.hpl.jena.tdb.store.DateTimeNode.unpack(DateTimeNode.java:255)
        at 
com.hp.hpl.jena.tdb.store.DateTimeNode.unpackDateTime(DateTimeNode.java:180)
        at com.hp.hpl.jena.tdb.store.NodeId.extract(NodeId.java:313)
        at 
com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:64)
        at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:163)
        at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:155)
        at com.hp.hpl.jena.tdb.lib.TupleLib.access$100(TupleLib.java:45)
        at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:89)
        at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:85)
        at org.openjena.atlas.iterator.Iter$4.next(Iter.java:301)
        at org.openjena.atlas.iterator.IteratorCons.next(IteratorCons.java:94)
        at org.openjena.atlas.iterator.Iter.sendToSink(Iter.java:560)
        at org.openjena.riot.out.NQuadsWriter.write(NQuadsWriter.java:45)
        at org.openjena.riot.out.NQuadsWriter.write(NQuadsWriter.java:37)
        at org.openjena.riot.RiotWriter.writeNQuads(RiotWriter.java:41)
        at tdb.tdbdump.exec(tdbdump.java:49)
        at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
        at tdb.tdbdump.main(tdbdump.java:31)

This seems to be a serious issue.

BTW: Here is some output from tdbloader2 for this TDB which shows that 
the tdbloader2 data phase runtime gets quite non-linear for very big datasets.
I called tdbloader2 with JVM_ARGS="-Xmx32768M -server" and it did not seem to
run into memory problems.

 12:39:17 -- TDB Bulk Loader Start
 12:39:17 Data phase
...
INFO  Add: 100,000,000 Data (Batch: 68,027 / Avg: 57,649)
...
INFO  Add: 500,000,000 Data (Batch: 55,309 / Avg: 41,446)
...
INFO  Add: 1,000,000,000 Data (Batch: 27,901 / Avg: 24,119)
...
INFO  Add: 1,100,000,000 Data (Batch: 335 / Avg: 6,308)
...
INFO  Add: 1,138,800,000 Data (Batch: 256 / Avg: 5,038)
...
INFO  Total: 1,138,845,529 tuples : 227,654.44 seconds : 5,002.52 tuples/sec 
[2012/07/22 03:53:36 CEST]
...
 20:24:24 -- TDB Bulk Loader Finish
 20:24:24 -- 373477 seconds

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Reply via email to