On 27/12/12 19:17, Abhishek Shivkumar wrote:
Thanks Andy. It runs for few lines and then throws an error like below
saying that the triples are not ending with a DOT. I am assuming the file
doesn't end every triple with a "." . Is there a work around for this?
I read somewhere in the mailing list that this has been fixed in SVN.
Anyways, I downloaded the jena from http://www.apache.org/dist/jena/
1/ the warning for es-419 is fixed.
2/
[[
The Bad IRI: <http://*lv.wikipedia.org/wiki/Riode₧aneiro_"Fluminense">
Code: 4/UNWISE_CHARACTER in PATH: The character matches no grammar rules
of URIs/IRIs. These characters are permitted in RDF URI References, XML
system identifiers, and XML Schema anyURIs.
]]
Can't have " in URIs. The ₧ looks like an ISO-8859-1/UTF-8 encoding error.
3/ The file has bad syntax - you need to look at line 270608 or so and
fix it up. It's a bug in the freebase data.
Shame they haven't fixed it - it was wrong previously as well. It may
be something like unmatched quotes, or an encoding error, so it may look
right but isn't.
What are the lines around 270608?
(Another good reason for parsing data before loading!)
Andy
Thanks much!
*00:39:13 INFO loader :: Add: 150,000 triples (Batch: 40,950
/ Avg*
*: 23,648)*
*00:39:14 INFO loader :: Add: 200,000 triples (Batch: 43,898
/ Avg*
*: 26,730)*
*00:39:14 WARN riot :: [line: 209572, col: 54] Bad IRI:
<http://*
*lv.wikipedia.org/wiki/Riode₧aneiro_"Fluminense"> Code: 4/UNWISE_CHARACTER
in PAT*
*H: The character matches no grammar rules of URIs/IRIs. These characters
are per*
*mitted in RDF URI References, XML system identifiers, and XML Schema
anyURIs.*
*00:39:14 WARN riot :: [line: 219452, col: 33] Language
not vali*
*d: es-419*
*00:39:14 WARN riot :: [line: 219560, col: 24] Language
not vali*
*d: es-419*
*00:39:15 INFO loader :: Add: 250,000 triples (Batch: 43,975
/ Avg*
*: 29,005)*
*00:39:15 ERROR riot :: [line: 270608, col: 1 ] Triples not
termi*
*nated by DOT*
*Exception in thread "main" org.openjena.riot.RiotException: [line: 270608,
col:*
*1 ] Triples not terminated by DOT*
* at
org.openjena.riot.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHand*
*lerFactory.java:130)*
* at
org.openjena.riot.lang.LangEngine.raiseException(LangEngine.java:169)*
*
*
* at
org.openjena.riot.lang.LangEngine.exceptionDirect(LangEngine.java:162*
*)*
* at org.openjena.riot.lang.LangEngine.exception(LangEngine.java:155)
*
* at org.openjena.riot.lang.LangEngine.expect(LangEngine.java:147)*
* at
org.openjena.riot.lang.LangEngine.expectOrEOF(LangEngine.java:138)*
* at
org.openjena.riot.lang.LangTurtle.expectEndOfTriples(LangTurtle.java:*
*57)*
* at
org.openjena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:285*
*)*
* at
org.openjena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBa*
*se.java:223)*
* at
org.openjena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:*
*46)*
* at
org.openjena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:1*
*44)*
* at org.openjena.riot.lang.LangBase.parse(LangBase.java:43)*
* at org.openjena.riot.RiotReader.parseTriples(RiotReader.java:97)*
* at org.openjena.riot.RiotReader.parseTriples(RiotReader.java:83)*
* at org.openjena.riot.RiotReader.parseTriples(RiotReader.java:56)*
* at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadTriples$(BulkLoad*
*er.java:139)*
* at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDefaultGraph(Bulk*
*Loader.java:87)*
* at
com.hp.hpl.jena.tdb.TDBLoader.loadDefaultGraph$(TDBLoader.java:261)*
* at com.hp.hpl.jena.tdb.TDBLoader.loadGraph$(TDBLoader.java:244)*
* at com.hp.hpl.jena.tdb.TDBLoader.loadGraph(TDBLoader.java:177)*
* at com.hp.hpl.jena.tdb.TDBLoader.load(TDBLoader.java:112)*
* at tdb.tdbloader.loadDefaultGraph(tdbloader.java:150)*
* at tdb.tdbloader.exec(tdbloader.java:116)*
* at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)*
* at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)*
* at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)*
* at tdb.tdbloader.main(tdbloader.java:53)*
Thank you!
With Regards,
Abhishek S
On Fri, Dec 28, 2012 at 12:35 AM, Andy Seaborne <[email protected]> wrote:
On 27/12/12 18:42, Abhishek Shivkumar wrote:
Hi,
I am trying to load a large (55 GB!) rdf file into JENA TDB for sparql
querying later. Here is a snapshot of the file at the end of this email:
When I am using TDBLoader from command line using the following command:
*c:\JENA\apache-jena-2.7.4\**apache-jena-2.7.4\bat>**tdbloader.bat -loc
test
"C:\freebase-rdf-2012-12-09-**00-00"*
The TDB loader has no clue, via file extension, as to the syntax. The
default is n-quads/n-triples.
But it's turtle, hence a syntax error.
So either:
1/ Run "riotcmd.turtle FILE > data.nt"
This is preferred because:
A/ It check the file is valid before loading.
B/ The NT loads faster.
2/ Rename the file to "something.ttl"
Andy
I get this error:
*23:40:30 INFO loader :: -- Start triples data phase*
*23:40:30 INFO loader :: ** Load empty triples table*
*23:40:30 INFO loader :: -- Start quads data phase*
*23:40:30 INFO loader :: ** Load empty quads table*
*23:40:30 INFO loader :: Load: C:\Users\IBM_ADMIN\My
Documents\dow*
*n\freebase-rdf-2012-12-09-00-**00\freebase-rdf-2012-12-09-00-**00 --
2012/12/27 23:4*
*0:30 IST*
*23:40:30 ERROR riot :: [line: 1, col: 1 ] Expected BNode
or IRI:*
* Got: [DIRECTIVE:prefix]*
*Exception in thread "main" org.openjena.riot.**RiotException: [line: 1,
col:
1 ] E*
*xpected BNode or IRI: Got: [DIRECTIVE:prefix]*
* at
org.openjena.riot.**ErrorHandlerFactory$**ErrorHandlerStd.fatal(**
ErrorHand*
*lerFactory.java:130)*
* at
org.openjena.riot.lang.**LangEngine.raiseException(**
LangEngine.java:169)*
*
*
* at
org.openjena.riot.lang.**LangEngine.exceptionDirect(**
LangEngine.java:162*
*)*
* at org.openjena.riot.lang.**LangEngine.exception(**
LangEngine.java:155)
*
* at
org.openjena.riot.lang.**LangNTuple.checkIRIOrBNode(**
LangNTuple.java:107*
*)*
* at org.openjena.riot.lang.**LangNQuads.parseOne(**
LangNQuads.java:84)*
* at org.openjena.riot.lang.**LangNQuads.parseOne(**
LangNQuads.java:34)*
* at org.openjena.riot.lang.**LangNTuple.runParser(**
LangNTuple.java:69)*
* at org.openjena.riot.lang.**LangBase.parse(LangBase.java:**43)*
* at org.openjena.riot.RiotReader.**parseQuads(RiotReader.java:**
134)*
* at org.openjena.riot.RiotReader.**parseQuads(RiotReader.java:**
121)*
* at org.openjena.riot.RiotReader.**parseQuads(RiotReader.java:**
107)*
* at
com.hp.hpl.jena.tdb.store.**bulkloader.BulkLoader.**
loadQuads$(BulkLoader*
*.java:160)*
* at
com.hp.hpl.jena.tdb.store.**bulkloader.BulkLoader.**
loadDataset(BulkLoade*
*r.java:121)*
* at com.hp.hpl.jena.tdb.TDBLoader.**loadDataset$(TDBLoader.java:*
*283)*
* at com.hp.hpl.jena.tdb.TDBLoader.**loadDataset(TDBLoader.java:**
196)*
* at com.hp.hpl.jena.tdb.TDBLoader.**load(TDBLoader.java:75)*
* at tdb.tdbloader.loadQuads(**tdbloader.java:163)*
* at tdb.tdbloader.exec(tdbloader.**java:122)*
* at arq.cmdline.CmdMain.**mainMethod(CmdMain.java:101)*
* at arq.cmdline.CmdMain.mainRun(**CmdMain.java:63)*
* at arq.cmdline.CmdMain.mainRun(**CmdMain.java:50)*
* at tdb.tdbloader.main(tdbloader.**java:53)*
I need help in understanding this error and how to solve it. Is there a
problem with the input file?
@prefix ns: <http://rdf.freebase.com/ns/>.
@prefix key: <http://rdf.freebase.com/key/>**.
@prefix owl: <http://www.w3.org/2002/07/**owl#<http://www.w3.org/2002/07/owl#>
.
@prefix rdfs:
<http://www.w3.org/2000/01/**rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>
.
@prefix xsd:
<http://www.w3.org/2001/**XMLSchema#<http://www.w3.org/2001/XMLSchema#>
.
ns:m.012rkqx ns:type.object.type ns:common.topic.
ns:m.012rkqx ns:type.object.name "High Fidelity"@en.
ns:m.012rkqx ns:type.object.type ns:music.single.
ns:m.012rkqx ns:type.object.key ns:authority.musicbrainz.name.**
TRACK3987054.
ns:m.012rkqx ns:type.object.type ns:music.recording.
ns:m.012rkqx key:authority.musicbrainz "258c45bd-4437-4580-8988
-b3f3be975f9c".
ns:m.012rkqx key:authority.musicbrainz.name "TRACK3987054".
ns:m.012rkqx rdf:label "High Fidelity"@en.
ns:m.012rkqx rdf:type ns:common.topic.
ns:m.012rkqx rdf:type ns:music.single.
ns:m.012rkqx rdf:type ns:music.recording.