Hello Andy, On Thu, Jun 14, 2012 at 01:12:25PM +0100, Andy Seaborne wrote: > >I guess it would be a good idea to look at the end of the dump and check > >the > >corresponding named graph for bad datetimes ? > > Yes - my best guess at the moment is that a dateTime can get in (they > are encoded into 56 bits, not recorded using the lexical form) but there > was a problem on the recreation of the lexical form. Whether the > encoding or decoding is wrong, I can't tell.
I was not able to find the named graph causing the problem so I recreated the TDB with tdbloader2 from apache-jena-2.7.2 and tried tdbdump from apache-jena-2.7.2 immediately after that. The result is that I seem to run into the same problem: Exception in thread "main" org.openjena.atlas.AtlasException: formatInt: overflow at org.openjena.atlas.lib.NumberUtils.formatUnsignedInt(NumberUtils.java:115) at org.openjena.atlas.lib.NumberUtils.formatInt(NumberUtils.java:87) at org.openjena.atlas.lib.NumberUtils.formatInt(NumberUtils.java:60) at com.hp.hpl.jena.tdb.store.DateTimeNode.unpack(DateTimeNode.java:255) at com.hp.hpl.jena.tdb.store.DateTimeNode.unpackDateTime(DateTimeNode.java:180) at com.hp.hpl.jena.tdb.store.NodeId.extract(NodeId.java:313) at com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:64) at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:163) at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:155) at com.hp.hpl.jena.tdb.lib.TupleLib.access$100(TupleLib.java:45) at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:89) at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:85) at org.openjena.atlas.iterator.Iter$4.next(Iter.java:301) at org.openjena.atlas.iterator.IteratorCons.next(IteratorCons.java:94) at org.openjena.atlas.iterator.Iter.sendToSink(Iter.java:560) at org.openjena.riot.out.NQuadsWriter.write(NQuadsWriter.java:45) at org.openjena.riot.out.NQuadsWriter.write(NQuadsWriter.java:37) at org.openjena.riot.RiotWriter.writeNQuads(RiotWriter.java:41) at tdb.tdbdump.exec(tdbdump.java:49) at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101) at arq.cmdline.CmdMain.mainRun(CmdMain.java:63) at arq.cmdline.CmdMain.mainRun(CmdMain.java:50) at tdb.tdbdump.main(tdbdump.java:31) This seems to be a serious issue. BTW: Here is some output from tdbloader2 for this TDB which shows that the tdbloader2 data phase runtime gets quite non-linear for very big datasets. I called tdbloader2 with JVM_ARGS="-Xmx32768M -server" and it did not seem to run into memory problems. 12:39:17 -- TDB Bulk Loader Start 12:39:17 Data phase ... INFO Add: 100,000,000 Data (Batch: 68,027 / Avg: 57,649) ... INFO Add: 500,000,000 Data (Batch: 55,309 / Avg: 41,446) ... INFO Add: 1,000,000,000 Data (Batch: 27,901 / Avg: 24,119) ... INFO Add: 1,100,000,000 Data (Batch: 335 / Avg: 6,308) ... INFO Add: 1,138,800,000 Data (Batch: 256 / Avg: 5,038) ... INFO Total: 1,138,845,529 tuples : 227,654.44 seconds : 5,002.52 tuples/sec [2012/07/22 03:53:36 CEST] ... 20:24:24 -- TDB Bulk Loader Finish 20:24:24 -- 373477 seconds Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel