[Virtuoso-users] Virtuoso DBpedia load - parsing errors

2015-09-17 Thread Roman Sokolov
Hello.
I have a lot of errors when I want to load DBpedia dataset using isql, the
command:
ld_dir('/workingDir/btc2014_unzipped/01', 'data.nq-*', 'http://fake.org');

Example error:

 22007 XM003: XML parser detected an error: ERROR  : Tag nesting
 error: name 'img' of end tag does not match the name 'p' of start tag
 at line 4 column 432 at line 4 column 438 of source text
 04/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#";>
 --^

Ok, let's find the line where the error occured (I put a line break, so it
is easier to see):

 <
http://purl.org/rss/1.0/modules/content/encoded> "http://www.w3.org/1999/xhtml\"; xmlns:content=\"
http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
http://www.w3.org/2001/XMLSchema#\";>What data are exposed\nhttp://www.w3.org/1999/xhtml\"; xmlns:content=\"
http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
http://www.w3.org/2001/XMLSchema#\";>The CORE project exposes data about the
aggregated content. The following schema shows the kind of metadata CORE
holds about each resource. \nhttp://www.w3.org/1999/xhtml\";
xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
http://www.w3.org/2001/XMLSchema#\";>Data Schema\nhttp://www.w3.org/1999/xhtml\"; xmlns:content=\"
http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
http://www.w3.org/2001/XMLSchema#\";>
\nhttp://www.w3.org/1999/xhtml\"; xmlns:content=\"
http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
http://www.w3.org/2001/XMLSchema#\";>Data License\nhttp://www.w3.org/1999/xhtml\"; xmlns:content=\"
http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
http://www.w3.org/2001/XMLSchema#\";>All data from CORE (unless otherwise
specified) are available under the a Creative Commons Attribution 3.0
Unported License. \n"^^<
http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral> .

Also tried to load using different errors bits, the same result:
DB.DBA.TTLP_MT (file_to_string_output
('/workingDir/btc2014_unzipped/01/data.nq-9'), '', 'http://fake.org', 512)

Why Virtuoso tries to check HTML/XML tags consistency inside the literals?!
Is it possible to turn it off? I have too many errors in the dataset, it is
a waste of time trying to find all lines with errors and remove them by
hands. Can't find anything related to this in the documentation.

-- 
Best regards, Roman
--
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] Virtuoso DBpedia load - parsing errors

2015-09-17 Thread Patrick van Kleef
Hi Roman,

> I have a lot of errors when I want to load DBpedia dataset using isql, the 
> command:
> ld_dir('/workingDir/btc2014_unzipped/01', 'data.nq-*', 'http://fake.org');
> 
> Example error:
> 
>  22007 XM003: XML parser detected an error: ERROR  : Tag nesting
>  error: name 'img' of end tag does not match the name 'p' of start tag
>  at line 4 column 432 at line 4 column 438 of source text
>  04/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#";>
>  --^
> 
> Ok, let's find the line where the error occured (I put a line break, so it is 
> easier to see):
> 
>  
>  " xmlns=\"http://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>What data are 
> exposed\nhttp://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=
 \"http://www.w3.org/2001/XMLSchema#\";>The CORE project exposes data about the 
aggregated content. The following schema shows the kind of metadata CORE holds 
about each resource. \nhttp://www.w3.org/1999/xhtml\"; 
xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
xmlns:dc=\"http://purl.org/dc/terms/\"; 
xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>Data Schema\nhttp://www.w3.org/1999/xhtml\"; 
xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
xmlns:dc=\"http://purl.org/dc/terms/\"; 
xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/typ
 es#\" xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>
> \nhttp://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>Data License\n xmlns=\"http://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>All data from CORE (unless 
> otherwise specified) are available under th
 e a Creative Commons Attribution 3.0 Unported License. 
\n"^^ .
> 
> Also tried to load using different errors bits, the same result:
> DB.DBA.TTLP_MT (file_to_string_output 
> ('/workingDir/btc2014_unzipped/01/data.nq-9'), '', 'http://fake.org', 512)
> 
> Why Virtuoso tries to check HTML/XML tags consistency inside the literals?! 
> Is it possible to turn it off? I have too many errors in the dataset, it is a 
> waste of time trying to find all lines with errors and remove them by hands. 
> Can't find anything related to this in the documentation.


I have reproduced the problem in-house and i am currently talking to 
development to provide a solution to this problem. I will advice as soon as a 
patch is available.


Note that this is NOT the DBpedia dataset itself you are trying to load, but 
part of the Billion Triple Challenge 2014 (btc-2014) which is in a different 
format.

If you really meant to load the DBpedia datasets, check out this page:


http://wiki.dbpedia.org/Downloads2015-04


Patrick
---
Patrick van Kleef
Program Manager
OpenLink Softwa

Re: [Virtuoso-users] Virtuoso DBpedia load - parsing errors

2015-09-18 Thread Patrick van Kleef
Hi Roman,

> Hello.
> I have a lot of errors when I want to load DBpedia dataset using isql, the 
> command:
> ld_dir('/workingDir/btc2014_unzipped/01', 'data.nq-*', 'http://fake.org');
> 
> Example error:
> 
>  22007 XM003: XML parser detected an error: ERROR  : Tag nesting
>  error: name 'img' of end tag does not match the name 'p' of start tag
>  at line 4 column 432 at line 4 column 438 of source text
>  04/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#";>
>  --^
> 
> Ok, let's find the line where the error occured (I put a line break, so it is 
> easier to see):
> 
>  
>  " xmlns=\"http://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>What data are 
> exposed\nhttp://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=
 \"http://www.w3.org/2001/XMLSchema#\";>The CORE project exposes data about the 
aggregated content. The following schema shows the kind of metadata CORE holds 
about each resource. \nhttp://www.w3.org/1999/xhtml\"; 
xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
xmlns:dc=\"http://purl.org/dc/terms/\"; 
xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>Data Schema\nhttp://www.w3.org/1999/xhtml\"; 
xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
xmlns:dc=\"http://purl.org/dc/terms/\"; 
xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/typ
 es#\" xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>
> \nhttp://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>Data License\n xmlns=\"http://www.w3.org/1999/xhtml\"; 
> xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"; 
> xmlns:dc=\"http://purl.org/dc/terms/\"; 
> xmlns:foaf=\"http://xmlns.com/foaf/0.1/\"; xmlns:og=\"http://ogp.me/ns#\"; 
> xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"; 
> xmlns:sioc=\"http://rdfs.org/sioc/ns#\"; 
> xmlns:sioct=\"http://rdfs.org/sioc/types#\"; 
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; 
> xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\";>All data from CORE (unless 
> otherwise specified) are available under th
 e a Creative Commons Attribution 3.0 Unported License. 
\n"^^ .
> 
> Also tried to load using different errors bits, the same result:
> DB.DBA.TTLP_MT (file_to_string_output 
> ('/workingDir/btc2014_unzipped/01/data.nq-9'), '', 'http://fake.org', 512)
> 
> Why Virtuoso tries to check HTML/XML tags consistency inside the literals?! 
> Is it possible to turn it off? I have too many errors in the dataset, it is a 
> waste of time trying to find all lines with errors and remove them by hands. 
> Can't find anything related to this in the documentation.


I thought i spotted a parsing error on our end, but on closer examination this 
was not the case.

The issue here is that this value is tagged as a 
 which triggers Virtuoso 
to actually parse the XML inside the object. 

Unfortunately it appears there was either a problem with a lot of pages they 
crawled for this BTC 2014 dataset, or they cut out part of the page. In any 
case i examined a number of lines that failed and all had issues with artifacts 

Re: [Virtuoso-users] Virtuoso DBpedia load - parsing errors

2015-09-23 Thread Roman Sokolov
Thanks a lot for your help, Patrick!
Yes, my mistake, it is BTC dataset, not DBpedia.
I changed the literal types from XML to Plain and the errors disappeared.

But now I got the new error:
/btc2014_unzipped/01/data.nq-10
http://fake-latest.org
   2   2015.9.22 23:10.20 322216000  2015.9.22 23:10.38
888367000  0   NULL42000 RDFGE: RDF box with a geometry RDF
type and a non-geometry content

This error is quite frequent in the dataset. And I guess it is related to
geo-data. But the problem is, in contrast to the previous error, I can not
see the details and the line where the error occured, so I can not check in
the dataset which line caused the error. Strange that there is no details...

Thank you.

On 18 September 2015 at 13:42, Patrick van Kleef 
wrote:

> Hi Roman,
>
> > Hello.
> > I have a lot of errors when I want to load DBpedia dataset using isql,
> the command:
> > ld_dir('/workingDir/btc2014_unzipped/01', 'data.nq-*', 'http://fake.org
> ');
> >
> > Example error:
> >
> >  22007 XM003: XML parser detected an error: ERROR  : Tag nesting
> >  error: name 'img' of end tag does not match the name 'p' of start tag
> >  at line 4 column 432 at line 4 column 438 of source text
> >  04/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#
> ">
> >  --^
> >
> > Ok, let's find the line where the error occured (I put a line break, so
> it is easier to see):
> >
> >  <
> http://purl.org/rss/1.0/modules/content/encoded> " http://www.w3.org/1999/xhtml\"; xmlns:content=\"
> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
> http://www.w3.org/2001/XMLSchema#\";>What data are exposed\n xmlns=\"http://www.w3.org/1999/xhtml\"; xmlns:content=\"
> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
> http://www.w3.org/2001/XMLSchema#\";>The CORE project exposes data about
> the aggregated content. The following schema shows the kind of metadata
> CORE holds about each resource. \n http://www.w3.org/1999/xhtml\"; xmlns:content=\"
> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
> http://www.w3.org/2001/XMLSchema#\";>Data Schema\n http://www.w3.org/1999/xhtml\"; xmlns:content=\"
> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
> http://www.w3.org/2001/XMLSchema#\";>
> > \nhttp://www.w3.org/1999/xhtml\"; xmlns:content=\"
> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
> http://www.w3.org/2001/XMLSchema#\";>Data License\n http://www.w3.org/1999/xhtml\"; xmlns:content=\"
> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
> http://www.w3.org/2001/XMLSchema#\";>All data from CORE (unless otherwise
> specified) are available under the a Creative Commons Attribution 3.0
> Unported License. \n"^^<
> http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral> .
> >
> > Also tried to load using different errors bits, the same result:
> > DB.DBA.TTLP_MT (file_to_string_output
> ('/workingDir/btc2014_unzipped/01/data.nq-9'), '', 'http://f

Re: [Virtuoso-users] Virtuoso DBpedia load - parsing errors

2015-09-30 Thread Roman Sokolov
So could somebody help me to understand how to deal with this error while
importing the data?
/btc2014_unzipped/01/data.nq-10
http://fake-latest.org
   2   2015.9.22 23:10.20 322216000  2015.9.22 23:10.38
888367000  0   NULL42000 RDFGE: RDF box with a geometry RDF
type and a non-geometry content

There is no clue which particular lines cause the error, so I stuck and can
not remove or change them.
Or how can I load the data without lines containing errors?

Thank you.


On 23 September 2015 at 16:12, Roman Sokolov  wrote:

> Thanks a lot for your help, Patrick!
> Yes, my mistake, it is BTC dataset, not DBpedia.
> I changed the literal types from XML to Plain and the errors disappeared.
>
> But now I got the new error:
> /btc2014_unzipped/01/data.nq-10
> http://fake-latest.org
>  2   2015.9.22 23:10.20 322216000  2015.9.22 23:10.38
> 888367000  0   NULL42000 RDFGE: RDF box with a geometry RDF
> type and a non-geometry content
>
> This error is quite frequent in the dataset. And I guess it is related to
> geo-data. But the problem is, in contrast to the previous error, I can not
> see the details and the line where the error occured, so I can not check in
> the dataset which line caused the error. Strange that there is no details...
>
> Thank you.
>
> On 18 September 2015 at 13:42, Patrick van Kleef 
> wrote:
>
>> Hi Roman,
>>
>> > Hello.
>> > I have a lot of errors when I want to load DBpedia dataset using isql,
>> the command:
>> > ld_dir('/workingDir/btc2014_unzipped/01', 'data.nq-*', 'http://fake.org
>> ');
>> >
>> > Example error:
>> >
>> >  22007 XM003: XML parser detected an error: ERROR  : Tag nesting
>> >  error: name 'img' of end tag does not match the name 'p' of start tag
>> >  at line 4 column 432 at line 4 column 438 of source text
>> >  04/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#
>> ">
>> >  --^
>> >
>> > Ok, let's find the line where the error occured (I put a line break, so
>> it is easier to see):
>> >
>> >  <
>> http://purl.org/rss/1.0/modules/content/encoded> "> http://www.w3.org/1999/xhtml\"; xmlns:content=\"
>> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
>> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
>> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
>> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
>> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
>> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
>> http://www.w3.org/2001/XMLSchema#\";>What data are exposed\n> xmlns=\"http://www.w3.org/1999/xhtml\"; xmlns:content=\"
>> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
>> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
>> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
>> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
>> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
>> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
>> http://www.w3.org/2001/XMLSchema#\";>The CORE project exposes data about
>> the aggregated content. The following schema shows the kind of metadata
>> CORE holds about each resource. \n> http://www.w3.org/1999/xhtml\"; xmlns:content=\"
>> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
>> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
>> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
>> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
>> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
>> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
>> http://www.w3.org/2001/XMLSchema#\";>Data Schema\n> http://www.w3.org/1999/xhtml\"; xmlns:content=\"
>> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
>> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
>> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
>> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
>> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
>> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
>> http://www.w3.org/2001/XMLSchema#\";>
>> > \nhttp://www.w3.org/1999/xhtml\"; xmlns:content=\"
>> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
>> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/\";
>> xmlns:og=\"http://ogp.me/ns#\"; xmlns:rdfs=\"
>> http://www.w3.org/2000/01/rdf-schema#\"; xmlns:sioc=\"
>> http://rdfs.org/sioc/ns#\"; xmlns:sioct=\"http://rdfs.org/sioc/types#\";
>> xmlns:skos=\"http://www.w3.org/2004/02/skos/core#\"; xmlns:xsd=\"
>> http://www.w3.org/2001/XMLSchema#\";>Data License\n> http://www.w3.org/1999/xhtml\"; xmlns:content=\"
>> http://purl.org/rss/1.0/modules/content/\"; xmlns:dc=\"
>> http://purl.org/dc/terms/\"; xmlns:foaf=\"http://xmlns.com/foaf/0.1/