OK, I did further investigations:

I had a backup of the virtuoso db at the point just after loading the DBpedia 
dumps (en & de) and installing the DBpedia and rdf_mappers packages.
I replayed this backup and executed all 3 queries:

On Thursday 19 August 2010, Jörn Hees wrote:
> On Thursday 19 August 2010, Hugh Williams wrote:
> > SPARQL SELECT ?g count(*) WHERE { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g
> > ORDER BY DESC 2;
> > 
> > SPARQL SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o.} };
> > 
> > select * from SPARQL_SELECT_KNOWN_GRAPHS_T;

All resulting in the same 26 graphs.

I then went to the conductor / RDF / Schemas tab and imported 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf .
Then went to the Graphs tab, deleted the http://www.w3.org/2004/02/skos/core 
graph (worked fine), and renamed 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf to  
http://www.w3.org/2004/02/skos/core .

After this the first of the queries results in 27 graphs, both others in 26.
(In the first one the 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf still exists.)

I went back to the Schemas tab and deleted the 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf Schema. Still the 
same behavior, so 27 vs. 26 graphs.

When I now try to delete the http://www.w3.org/2004/02/skos/core from the 
Graphs tab the first query returns 26 graphs, the second and third 25. The 
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf is still there.

I then tried to *manually delete* the persisting graph:
  sparql clear graph
  <http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf>;
The result is even stranger:
If I do
  sparql select count(*) from
  <http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf>
  where {?s ?p ?o};
I get 0 rows.
BUT: my first sparql query still tells me that there are 1196 triples in that 
graph!
Also: if I execute this sql query:
  select distinct id_to_iri(g) from rdf_quad;
or this one:
  select distinct id_to_iri(g), count(*) from rdf_quad group by g;
The *graph still exists*.

A control query:
  sparql select count(*) from <http://dbpedia.org> where {?s ?p ?o} ;
tells me that there are 258867871 rows in my DBpedia dump.


As a side-question: why do all those queries take so long? Isn't there a 
primary key index on the rdf_quad table for g,s,p,o which they should be able 
to use and return with the count in a split second?


If you want I can provide you with a detailed description how I imported the 
DBpedia dumps (en & de) and could even give you the backup (11 GB gzipped).

Jörn

Reply via email to