Re: possibly not completely loaded dataset

Andy Seaborne Wed, 24 Feb 2021 11:48:47 -0800

> In the best case I can run the indexing phase over this database.
> Is it possible?


Yes.

    Andy

On 24/02/2021 19:40, Daniel Hernandez wrote:


Hi,

I have loaded Wikidata in Jena using tdbloader2. I noticed that some
queries do not produce the expected result.

Query 1:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT * WHERE {
   <http://www.wikidata.org/entity/Q31> wdt:P1344 ?o .
}

Query 2:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT * WHERE {
   ?s wdt:P1344 ?o .
}

Query 1 returns solutions, but query 2 returns an empty table. This is
contradictory because query 1 is more selective than query 2.

I guess that this is because tdbloader2 does not finished properly the
index phase. However, the loading log showed no errors.

My question is whether can I repeat the indexing phase with the data I
currently have:

-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GOSP.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GOSP.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GPOS.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GPOS.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GSPO.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 GSPO.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSP.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSP.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSPG.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 OSPG.idn
-rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 17:24 POS-txt
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POS.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POS.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POSG.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:24 POSG.idn
-rw-r--r-- 1 ubuntu ubuntu 276379467776 Feb 23 17:47 SPO.dat
-rw-r--r-- 1 ubuntu ubuntu    956301312 Feb 23 17:47 SPO.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:47 SPOG.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 17:48 SPOG.idn
-rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 17:48 data-quads.tmp
-rw-r--r-- 1 ubuntu ubuntu 592136511840 Feb 23 18:41 data-triples.tmp
-rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 18:41 journal.jrnl
-rw-r--r-- 1 ubuntu ubuntu  67679289344 Feb 23 18:48 node2id.dat
-rw-r--r-- 1 ubuntu ubuntu    293601280 Feb 23 18:48 node2id.idn
-rw-r--r-- 1 ubuntu ubuntu 136298477605 Feb 23 19:00 nodes.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefix2id.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefix2id.idn
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefixIdx.dat
-rw-r--r-- 1 ubuntu ubuntu      8388608 Feb 23 19:00 prefixIdx.idn
-rw-r--r-- 1 ubuntu ubuntu            0 Feb 23 19:00 prefixes.dat
-rw-r--r-- 1 ubuntu ubuntu      1793582 Feb 23 19:00 stats.opt

In the best case I can run the indexing phase over this database. Is it
possible? Do you recommend me another solution to fix this database
without loading the data again?

Best,
Daniel

Re: possibly not completely loaded dataset

Reply via email to