On 14/02/2022 08:01, Neubert, Joachim wrote:
Thanks, Andy, the TDB2 assembler fixed it, and all worked well.

I've tried to load wikidata-truthy then, but apparently the bzip file was 
damaged at line 4052914959 - have to try again

How annoying.

Is it an RDF syntax error or bad binary or somethign else?

--

My experience is that gz is faster to load.

bz2 emphases compactness over speed.

    Andy


Cheers, Joachim

-----Ursprüngliche Nachricht-----
Von: Andy Seaborne <a...@apache.org>
Gesendet: Samstag, 12. Februar 2022 11:15
An: users@jena.apache.org
Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"

Hi Joachim,

Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".

The build setup is for repeatable builds of releases. Any build from the X.Y.Z
release source, with the same JDK, will generate the byte-wise same jar files.

Each release build fixes the timestamp and uses that, and it gets in the POM
as property <project.build.outputTimestamp>. It only get updated when a
release happens otherwise the POM file is going to get modified several
times a week.

Thankfully, we have --version on most commands as well.

That's timestamps explained.

----

You seem to have run the TDB2 xloader, then given the text index builder a
assembler description for TDB1.

Fuseki with --loc determines the database type by looking at the file layout,
but assemblers don't.

The version output can be changed to say "TDB1" without too much
disruption. Small tweak that might have helped shown this up earlier.

      Andy

On 11/02/2022 23:06, Neubert, Joachim wrote:
Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.

Now the loading works smoothly:

22:50:10 INFO  Load node table  = 62 seconds
22:50:10 INFO  Load ingest data = 37 seconds
22:50:10 INFO  Build index SPO  = 7 seconds
22:50:10 INFO  Build index POS  = 12 seconds
22:50:10 INFO  Build index OSP  = 9 seconds
22:50:10 INFO  Overall          127 seconds
22:50:10 INFO  Overall          00h 02m 07s
22:50:10 INFO  Triples loaded   = 10000000
22:50:10 INFO  Quads loaded     = 0
22:50:10 INFO  Overall Rate     78740 tuples per second

That's output from tdb2.xloader.

At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
"tdb2.tdbloader --loader=parallel"

However, the text indexing crashes, when called like that:

java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
--desc=/tmp/temp.ttl

org.apache.jena.assembler.exceptions.AssemblerException: caught:
Unable to check TDB lock owner, the lock file contents appear to be for a
TDB2 database.  Please try loading this location as a TDB2 database. See
https://jena.apache.org/documentation/tdb/faqs.html for more
information.
    doing:
      root: file:///tmp/temp.ttl#dataset with type:
http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
org.apache.jena.tdb.assembler.DatasetAssemblerTDB1

But that is TDB1

      root: http://localhost/jena_example/#text_dataset with type:
http://jena.apache.org/text#TextDataset assembler class: class
org.apache.jena.query.text.assembler.TextDatasetAssembler

...
Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check
TDB lock owner, the lock file contents appear to be for a TDB2 database.
Please try loading this location as a TDB2 database. See
https://jena.apache.org/documentation/tdb/faqs.html for more
information.
          at
org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
110)

org.apache.jena.tdb == TDB1

          at
org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
          at
org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java
:262)
          at
org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
          at
org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
          at
org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGra
phTransaction.java:72)
          at
org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
...

          ... 23 more
2022-02-11 22:50:12 ABORTED

cat /var/lib/fuseki/databases/temp/tdb.lock
32907

Cheers, Joachim

Reply via email to