Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT. Now the loading works smoothly:
22:50:10 INFO Load node table = 62 seconds 22:50:10 INFO Load ingest data = 37 seconds 22:50:10 INFO Build index SPO = 7 seconds 22:50:10 INFO Build index POS = 12 seconds 22:50:10 INFO Build index OSP = 9 seconds 22:50:10 INFO Overall 127 seconds 22:50:10 INFO Overall 00h 02m 07s 22:50:10 INFO Triples loaded = 10000000 22:50:10 INFO Quads loaded = 0 22:50:10 INFO Overall Rate 78740 tuples per second However, the text indexing crashes, when called like that: java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug --desc=/tmp/temp.ttl org.apache.jena.assembler.exceptions.AssemblerException: caught: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information. doing: root: file:///tmp/temp.ttl#dataset with type: http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class org.apache.jena.tdb.assembler.DatasetAssemblerTDB1 root: http://localhost/jena_example/#text_dataset with type: http://jena.apache.org/text#TextDataset assembler class: class org.apache.jena.query.text.assembler.TextDatasetAssembler at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:165) at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.open(AssemblerGroup.java:144) at org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.open(AssemblerGroup.java:93) at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:39) at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:35) at org.apache.jena.query.text.assembler.TextDatasetAssembler.open(TextDatasetAssembler.java:67) at org.apache.jena.query.text.assembler.TextDatasetAssembler.open(TextDatasetAssembler.java:42) at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:157) at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.open(AssemblerGroup.java:144) at org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.open(AssemblerGroup.java:93) at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:39) at org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:35) at org.apache.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:144) at org.apache.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:132) at org.apache.jena.query.text.TextDatasetFactory.create(TextDatasetFactory.java:38) at org.apache.jena.query.text.cmd.textindexer.processModulesAndArgs(textindexer.java:90) at org.apache.jena.cmd.CmdArgModule.process(CmdArgModule.java:39) at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:86) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43) at org.apache.jena.query.text.cmd.textindexer.main(textindexer.java:52) at org.apache.jena.query.text.cmd.InitTextCmds.lambda$cmds$1(InitTextCmds.java:26) at org.apache.jena.cmd.Cmds.exec(Cmds.java:65) at jena.textindexer.main(textindexer.java:25) Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information. at org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:110) at org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139) at org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java:262) at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226) at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240) at org.apache.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGraphTransaction.java:72) at org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114) at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) at org.apache.jena.tdb.sys.TDBMaker._create(TDBMaker.java:100) at org.apache.jena.tdb.sys.TDBMaker.createDatasetGraphTransaction(TDBMaker.java:43) at org.apache.jena.tdb.TDBFactory._createDatasetGraph(TDBFactory.java:93) at org.apache.jena.tdb.TDBFactory.createDatasetGraph(TDBFactory.java:71) at org.apache.jena.tdb.assembler.DatasetAssemblerTDB1.make(DatasetAssemblerTDB1.java:55) at org.apache.jena.tdb.assembler.DatasetAssemblerTDB1.createDataset(DatasetAssemblerTDB1.java:46) at org.apache.jena.sparql.core.assembler.DatasetAssembler.open(DatasetAssembler.java:40) at org.apache.jena.sparql.core.assembler.DatasetAssembler.open(DatasetAssembler.java:33) at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:157) ... 23 more 2022-02-11 22:50:12 ABORTED cat /var/lib/fuseki/databases/temp/tdb.lock 32907 Cheers, Joachim > -----Ursprüngliche Nachricht----- > Von: Andy Seaborne <a...@apache.org> > Gesendet: Freitag, 11. Februar 2022 23:06 > An: users@jena.apache.org > Betreff: Re: AW: AW: xloader "Can't find gzip program" > > > > On 11/02/2022 21:38, Neubert, Joachim wrote: > > Strange - I should have the same version: > > > > sudo tar xzvf > > /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz > > Different jar file : apache-jena-4.5.0-20220209.180144-12 (no Fuseki) but > weird anyway. > > wget > https://repository.apache.org/content/groups/snapshots/org/apache/jena/ > apache-jena/4.5.0-SNAPSHOT/apache-jena-4.5.0-20220209.180144-12.zip > > then the zip file is: > > 27372309 Feb 9 18:26 apache-jena-4.5.0-20220209.180144-12.zip > > > apache-jena-4.5.0-SNAPSHOT/bin/tdb2.tdbloader --version > > Jena: VERSION: 4.5.0-SNAPSHOT > Jena: BUILD_DATE: 2022-02-09T18:01:44Z > ARQ: VERSION: 4.5.0-SNAPSHOT > ARQ: BUILD_DATE: 2022-02-09T18:01:44Z > TDB2: VERSION: 4.5.0-SNAPSHOT > TDB2: BUILD_DATE: 2022-02-09T18:01:44Z > > yet the TDB2 jar is dated 30th Jan, as are the files inside it -- can't > explain that. > > 294846 Jan 30 15:03 > apache-jena-4.5.0-SNAPSHOT/lib/jena-tdb2-4.5.0-SNAPSHOT.jar > > The tdb2.xloader script is 10485 bytes and has > > SORT_THREADS="2" > > in it. Is that what your copy of the script have in it? > > I'll clear the Jenkins workspace and schedule a new build. > > Andy > > > > > but the jarfile date is of Jan 30: > > > > ll apache-jena-fuseki-4.5.0-SNAPSHOT/ > > total 35868 > > -rw-r--r-- 1 root root 36975 Jan 30 15:02 LICENSE > > -rw-r--r-- 1 root root 8914 Jan 30 15:02 NOTICE > > -rw-r--r-- 1 root root 1151 Jan 30 15:02 README > > drwxr-xr-x 2 root root 179 Feb 11 20:47 bin > > -rwxr-xr-x 1 root root 12339 Jan 30 15:02 fuseki > > -rwxr-xr-x 1 root root 1241 Jan 30 15:02 fuseki-backup > > -rwxr-xr-x 1 root root 3370 Jan 30 15:02 fuseki-server > > -rw-r--r-- 1 root root 1264 Jan 30 15:02 fuseki-server.bat > > -rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar > > -rw-r--r-- 1 root root 2217 Jan 30 15:02 fuseki.service > > -rw-r--r-- 1 root root 2124 Jan 30 15:02 log4j2.properties > > drwxr-xr-x 4 root root 121 Jan 30 15:02 webapp > > > > Cheers, Joachim > > > >> -----Ursprüngliche Nachricht----- > >> Von: Andy Seaborne <a...@apache.org> > >> Gesendet: Freitag, 11. Februar 2022 22:30 > >> An: users@jena.apache.org > >> Betreff: Re: AW: xloader "Can't find gzip program" > >> > >> Works for me - make sure it is the latest dev build (the one down the > >> bottom) > >> > >> I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09) > >> > >> and loaded a few millions triples with no problems. > >> > >> rm -rf DB2 > >> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2 > >> ~/Datasets/BSBM/bsbm-5m.nt.gz > >> > >> Andy > >> > >> On 11/02/2022 21:20, Neubert, Joachim wrote: > >>> Hi Andy, > >>> > >>> Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster > >>> - > >> however, the same error at SPO start. > >>> > >>> Please let me know if I can help with tracing/reproducing the error. > >>> > >>> Cheers, Joachim > >>> > >>>> -----Ursprüngliche Nachricht----- > >>>> Von: Andy Seaborne <a...@apache.org> > >>>> Gesendet: Freitag, 11. Februar 2022 21:07 > >>>> An: users@jena.apache.org > >>>> Betreff: Re: xloader "Can't find gzip program" > >>>> > >>>> Hi Joachim, > >>>> > >>>> https://issues.apache.org/jira/browse/JENA-2277 > >>>> https://issues.apache.org/jira/browse/JENA-2279 > >>>> > >>>> There are two fixes for tdb2.xloader which are now in the > >>>> development > >>>> builds: > >>>> > >>>> https://repository.apache.org/content/groups/snapshots/org/apache/j > >>>> en > >>>> a/ > >>>> > >>>> (these are not official releases and have not been voted on by the > >>>> PMC) > >>>> > >>>> If you coudl test them and let us know if they work or whether > >>>> theer are further problems, that would be great. > >>>> > >>>> Andy > >>>> > >>>> > >>>> On 11/02/2022 17:53, Neubert, Joachim wrote: > >>>>> I've just started tests with xloader. It aborts with > >>>>> > >>>>> 17:21:56 INFO Data :: Triples = 10,000,000 ; Quads = 0 > >>>>> 17:21:57 INFO =-=-=-=-=-=-=-= > >>>>> 17:21:57 INFO > >>>>> 17:21:57 INFO Build SPO > >>>>> 17:21:57 INFO (Very long pause likely at this point) > >>>>> 17:21:58 INFO Index :: Build index SPO > >>>>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: > >>>>> Can't find > >>>> gzip program > >>>>> at > >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcB > >>>> ui > >>>> ldIn > >>>> dexX.java:207) > >>>>> at > >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIn > >>>> de > >>>> xX.ja > >>>> va:121) > >>>>> at > >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX. > >>>> ja > >>>> va:1 > >>>> 06) > >>>>> at > >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.j > >>>> av > >>>> a:94 > >>>> ) > >>>>> at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80) > >>>>> at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92) > >>>>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58) > >>>>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45) > >>>>> at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28) > >>>>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip > program > >>>>> at > >>>> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.ja > >>>> va > >>>> :67 > >>>> ) > >>>>> at > >>>> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcB > >>>> ui > >>>> ldIn > >>>> dexX.java:183) > >>>>> ... 8 more > >>>>> > >>>>> Of course, /usr/bin/gzip is in the path. My configuration is > >>>>> below, > >>>> tdb2.xloader was called with --threads=12. > >>>>> > >>>>> Any idea what could be wrong? > >>>>> > >>>>> Cheers, Joachim > >>>>> > >>>>> > >>>>> Configuration: > >>>>> openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime > >> Environment > >>>>> 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build > >>>>> 11.0.13+8-LTS, mixed mode, sharing) > >>>>> JAVA_OPTS: -d64 -Xmx12G > >>>>> Loader: tdb2.xloader > >>>>> Jena: VERSION: 4.4.0 > >>>>> Jena: BUILD_DATE: 2022-01-30T15:09:41Z > >>>>> ARQ: VERSION: 4.4.0 > >>>>> ARQ: BUILD_DATE: 2022-01-30T15:09:41Z > >>>>> TDB: VERSION: 4.4.0 > >>>>> TDB: BUILD_DATE: 2022-01-30T15:09:41Z > >>>>> > >>>>> Use fuseki tdb2.xloader on file > >>>>> /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz > >>>>> 17:20:13 INFO Setup: > >>>>> 17:20:13 INFO Database: /zbw/var/lib/fuseki/databases/temp > >>>>> 17:20:13 INFO Data: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz > >>>>> 17:20:13 INFO TMPDIR: /zbw/var/lib/fuseki/databases/temp > >>>>> 17:20:13 INFO > >>>>> 17:20:13 INFO Load node table > >>>>> > >>>>> > >>>>> -- > >>>>> Joachim Neubert > >>>>> > >>>>> ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg > >>>>> 21 > >>>>> 20354 Hamburg > >>>>> Phone +49-40-42834-462 > >>>>> > >>>>>