On 27/12/12 19:17, Abhishek Shivkumar wrote:
Thanks Andy. It runs for few lines and then throws an error like below
saying that the triples are not ending with a DOT. I am assuming the file
doesn't end every triple with a "." . Is there a work around for this?

I read somewhere in the mailing list that this has been fixed in SVN.
Anyways, I downloaded the jena from http://www.apache.org/dist/jena/

1/ the warning for es-419 is fixed.

2/
[[
The Bad IRI: <http://*lv.wikipedia.org/wiki/Riode₧aneiro_"Fluminense";> Code: 4/UNWISE_CHARACTER in PATH: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
]]

Can't have " in URIs.  The ₧ looks like an ISO-8859-1/UTF-8 encoding error.

3/ The file has bad syntax - you need to look at line 270608 or so and fix it up. It's a bug in the freebase data.

Shame they haven't fixed it - it was wrong previously as well. It may be something like unmatched quotes, or an encoding error, so it may look right but isn't.

What are the lines around 270608?


(Another good reason for parsing data before loading!)

        Andy



Thanks much!

*00:39:13 INFO  loader               :: Add: 150,000 triples (Batch: 40,950
/ Avg*
*: 23,648)*
*00:39:14 INFO  loader               :: Add: 200,000 triples (Batch: 43,898
/ Avg*
*: 26,730)*
*00:39:14 WARN  riot                 :: [line: 209572, col: 54] Bad IRI:
<http://*
*lv.wikipedia.org/wiki/Riode₧aneiro_"Fluminense"> Code: 4/UNWISE_CHARACTER
in PAT*
*H: The character matches no grammar rules of URIs/IRIs. These characters
are per*
*mitted in RDF URI References, XML system identifiers, and XML Schema
anyURIs.*
*00:39:14 WARN  riot                 :: [line: 219452, col: 33] Language
not vali*
*d: es-419*
*00:39:14 WARN  riot                 :: [line: 219560, col: 24] Language
not vali*
*d: es-419*
*00:39:15 INFO  loader               :: Add: 250,000 triples (Batch: 43,975
/ Avg*
*: 29,005)*
*00:39:15 ERROR riot                 :: [line: 270608, col: 1 ] Triples not
termi*
*nated by DOT*
*Exception in thread "main" org.openjena.riot.RiotException: [line: 270608,
col:*
*1 ] Triples not terminated by DOT*
*        at
org.openjena.riot.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHand*
*lerFactory.java:130)*
*        at
org.openjena.riot.lang.LangEngine.raiseException(LangEngine.java:169)*
*
*
*        at
org.openjena.riot.lang.LangEngine.exceptionDirect(LangEngine.java:162*
*)*
*        at org.openjena.riot.lang.LangEngine.exception(LangEngine.java:155)
*
*        at org.openjena.riot.lang.LangEngine.expect(LangEngine.java:147)*
*        at
org.openjena.riot.lang.LangEngine.expectOrEOF(LangEngine.java:138)*
*        at
org.openjena.riot.lang.LangTurtle.expectEndOfTriples(LangTurtle.java:*
*57)*
*        at
org.openjena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:285*
*)*
*        at
org.openjena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBa*
*se.java:223)*
*        at
org.openjena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:*
*46)*
*        at
org.openjena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:1*
*44)*
*        at org.openjena.riot.lang.LangBase.parse(LangBase.java:43)*
*        at org.openjena.riot.RiotReader.parseTriples(RiotReader.java:97)*
*        at org.openjena.riot.RiotReader.parseTriples(RiotReader.java:83)*
*        at org.openjena.riot.RiotReader.parseTriples(RiotReader.java:56)*
*        at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadTriples$(BulkLoad*
*er.java:139)*
*        at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDefaultGraph(Bulk*
*Loader.java:87)*
*        at
com.hp.hpl.jena.tdb.TDBLoader.loadDefaultGraph$(TDBLoader.java:261)*
*        at com.hp.hpl.jena.tdb.TDBLoader.loadGraph$(TDBLoader.java:244)*
*        at com.hp.hpl.jena.tdb.TDBLoader.loadGraph(TDBLoader.java:177)*
*        at com.hp.hpl.jena.tdb.TDBLoader.load(TDBLoader.java:112)*
*        at tdb.tdbloader.loadDefaultGraph(tdbloader.java:150)*
*        at tdb.tdbloader.exec(tdbloader.java:116)*
*        at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)*
*        at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)*
*        at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)*
*        at tdb.tdbloader.main(tdbloader.java:53)*

Thank you!

With Regards,
Abhishek S


On Fri, Dec 28, 2012 at 12:35 AM, Andy Seaborne <[email protected]> wrote:

On 27/12/12 18:42, Abhishek Shivkumar wrote:

Hi,

    I am trying to load a large (55 GB!) rdf file into JENA TDB for sparql
querying later. Here is a snapshot of the file at the end of this email:

When I am using TDBLoader from command line using the following command:

*c:\JENA\apache-jena-2.7.4\**apache-jena-2.7.4\bat>**tdbloader.bat -loc
test
"C:\freebase-rdf-2012-12-09-**00-00"*


The TDB loader has no clue, via file extension, as to the syntax.  The
default is n-quads/n-triples.

But it's turtle, hence a syntax error.

So either:

1/ Run "riotcmd.turtle FILE > data.nt"

This is preferred because:
   A/ It check the file is valid before loading.
   B/ The NT loads faster.

2/ Rename the file to "something.ttl"

         Andy


I get this error:

*23:40:30 INFO  loader               :: -- Start triples data phase*
*23:40:30 INFO  loader               :: ** Load empty triples table*
*23:40:30 INFO  loader               :: -- Start quads data phase*
*23:40:30 INFO  loader               :: ** Load empty quads table*
*23:40:30 INFO  loader               :: Load: C:\Users\IBM_ADMIN\My
Documents\dow*
*n\freebase-rdf-2012-12-09-00-**00\freebase-rdf-2012-12-09-00-**00 --
2012/12/27 23:4*
*0:30 IST*
*23:40:30 ERROR riot                 :: [line: 1, col: 1 ] Expected BNode

or IRI:*
* Got: [DIRECTIVE:prefix]*
*Exception in thread "main" org.openjena.riot.**RiotException: [line: 1,
col:

1 ] E*
*xpected BNode or IRI: Got: [DIRECTIVE:prefix]*
*        at
org.openjena.riot.**ErrorHandlerFactory$**ErrorHandlerStd.fatal(**
ErrorHand*
*lerFactory.java:130)*
*        at
org.openjena.riot.lang.**LangEngine.raiseException(**
LangEngine.java:169)*
*
*
*        at
org.openjena.riot.lang.**LangEngine.exceptionDirect(**
LangEngine.java:162*
*)*
*        at org.openjena.riot.lang.**LangEngine.exception(**
LangEngine.java:155)
*
*        at
org.openjena.riot.lang.**LangNTuple.checkIRIOrBNode(**
LangNTuple.java:107*
*)*
*        at org.openjena.riot.lang.**LangNQuads.parseOne(**
LangNQuads.java:84)*
*        at org.openjena.riot.lang.**LangNQuads.parseOne(**
LangNQuads.java:34)*
*        at org.openjena.riot.lang.**LangNTuple.runParser(**
LangNTuple.java:69)*
*        at org.openjena.riot.lang.**LangBase.parse(LangBase.java:**43)*
*        at org.openjena.riot.RiotReader.**parseQuads(RiotReader.java:**
134)*
*        at org.openjena.riot.RiotReader.**parseQuads(RiotReader.java:**
121)*
*        at org.openjena.riot.RiotReader.**parseQuads(RiotReader.java:**
107)*
*        at
com.hp.hpl.jena.tdb.store.**bulkloader.BulkLoader.**
loadQuads$(BulkLoader*
*.java:160)*
*        at
com.hp.hpl.jena.tdb.store.**bulkloader.BulkLoader.**
loadDataset(BulkLoade*
*r.java:121)*
*        at com.hp.hpl.jena.tdb.TDBLoader.**loadDataset$(TDBLoader.java:*
*283)*
*        at com.hp.hpl.jena.tdb.TDBLoader.**loadDataset(TDBLoader.java:**
196)*
*        at com.hp.hpl.jena.tdb.TDBLoader.**load(TDBLoader.java:75)*
*        at tdb.tdbloader.loadQuads(**tdbloader.java:163)*
*        at tdb.tdbloader.exec(tdbloader.**java:122)*
*        at arq.cmdline.CmdMain.**mainMethod(CmdMain.java:101)*
*        at arq.cmdline.CmdMain.mainRun(**CmdMain.java:63)*
*        at arq.cmdline.CmdMain.mainRun(**CmdMain.java:50)*
*        at tdb.tdbloader.main(tdbloader.**java:53)*


I need help in understanding this error and how to solve it. Is there a
problem with the input file?


@prefix ns: <http://rdf.freebase.com/ns/>.
@prefix key: <http://rdf.freebase.com/key/>**.
@prefix owl: <http://www.w3.org/2002/07/**owl#<http://www.w3.org/2002/07/owl#>
.
@prefix rdfs: 
<http://www.w3.org/2000/01/**rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>
.
@prefix xsd: 
<http://www.w3.org/2001/**XMLSchema#<http://www.w3.org/2001/XMLSchema#>
.

ns:m.012rkqx    ns:type.object.type     ns:common.topic.
ns:m.012rkqx    ns:type.object.name     "High Fidelity"@en.
ns:m.012rkqx    ns:type.object.type     ns:music.single.
ns:m.012rkqx    ns:type.object.key      ns:authority.musicbrainz.name.**
TRACK3987054.
ns:m.012rkqx    ns:type.object.type     ns:music.recording.
ns:m.012rkqx    key:authority.musicbrainz       "258c45bd-4437-4580-8988
-b3f3be975f9c".
ns:m.012rkqx    key:authority.musicbrainz.name  "TRACK3987054".
ns:m.012rkqx    rdf:label       "High Fidelity"@en.
ns:m.012rkqx    rdf:type        ns:common.topic.
ns:m.012rkqx    rdf:type        ns:music.single.
ns:m.012rkqx    rdf:type        ns:music.recording.





Reply via email to