Hi Roman,

> I have a lot of errors when I want to load DBpedia dataset using isql, the 
> command:
> ld_dir('/workingDir/btc2014_unzipped/01', 'data.nq-*', 'http://fake.org');
> 
> Example error:
> 
>  22007 XM003: XML parser detected an error:     ERROR  : Tag nesting
>  error: name 'img' of end tag does not match the name 'p' of start tag
>  at line 4 column 432 at line 4 column 438 of source text
>  04/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#";></img></p>
>  ----------------------------------------------------------------------^
> 
> Ok, let's find the line where the error occured (I put a line break, so it is 
> easier to see):
> 
> <http://core-project.kmi.open.ac.uk/data-description> 
> <http://purl.org/rss/1.0/modules/content/encoded> "<h2 
> xmlns=\"http://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>What data are 
> exposed</h2>\n<p xmlns=\"http://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=
 \"http://www.w3.org/2001/XMLSchema#\";>The CORE project exposes data about the 
aggregated content. The following schema shows the kind of metadata CORE holds 
about each resource. </p>\n<h2 xmlns=\"http://www.w3.org/1999/xhtml\"; 
xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
xmlns:dc=\"http://purl.org/dc/terms/\"; 
xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>Data Schema</h2>\n<p 
xmlns=\"http://www.w3.org/1999/xhtml\"; 
xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
xmlns:dc=\"http://purl.org/dc/terms/\"; 
xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/typ
 es#\" xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";></img></p>
>     \n<h2 xmlns=\"http://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>Data License</h2>\n<p 
> xmlns=\"http://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>All data from CORE (unless 
> otherwise specified) are available under th
 e a Creative Commons Attribution 3.0 Unported License. 
</p>\n"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral> .
> 
> Also tried to load using different errors bits, the same result:
> DB.DBA.TTLP_MT (file_to_string_output 
> ('/workingDir/btc2014_unzipped/01/data.nq-9'), '', 'http://fake.org', 512)
> 
> Why Virtuoso tries to check HTML/XML tags consistency inside the literals?! 
> Is it possible to turn it off? I have too many errors in the dataset, it is a 
> waste of time trying to find all lines with errors and remove them by hands. 
> Can't find anything related to this in the documentation.


I have reproduced the problem in-house and i am currently talking to 
development to provide a solution to this problem. I will advice as soon as a 
patch is available.


Note that this is NOT the DBpedia dataset itself you are trying to load, but 
part of the Billion Triple Challenge 2014 (btc-2014) which is in a different 
format.

If you really meant to load the DBpedia datasets, check out this page:


        http://wiki.dbpedia.org/Downloads2015-04


Patrick
---
Patrick van Kleef
Program Manager
OpenLink Software

http://www.openlinksw.com/
http://twitter.com/openlink/


------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to