Hi Andy,

  Here are the triples from the neighborhood of line 270608. i tried
finding the error but couldn't. Do you see any by chance?
I printed the line number too on the left just in case.  Ex: "line num
270591-"
------------------------------
line num: 270591- ns:m.01gqn1 ns:type.object.type
ns:organization.organization.
line num: 270592- ns:m.01gqn1 ns:type.object.key
"/wikipedia/en/Worker$0027s_Party_of_Brazil".
line num: 270593- ns:m.01gqn1 ns:type.object.name "Workers' Party"@en.
line num: 270594- ns:m.01gqn1 ns:type.object.name "Partido dos
Trabalhadores"@ca.
line num: 270595- ns:m.01gqn1 ns:type.object.key
"/wikipedia/en/Partido_dos_Trabalhadores".
line num: 270596- ns:m.01gqn1 ns:type.object.name "Partido dos
Trabalhadores"@de.
line num: 270597- ns:m.01gqn1 ns:type.object.key
"/wikipedia/pt_title/Partido_dos_Trabalhadores".
line num: 270598- ns:m.01gqn1 ns:type.object.key
"/wikipedia/it_title/Partito_dei_Lavoratori_$0028Brasile$0029".
line num: 270599- ns:m.01gqn1 ns:type.object.key
"/wikipedia/ja_title/$52B4$50CD$8005$515A_$0028$30D6$30E9$30B8$30EB$0029".
line num: 270600- ns:m.01gqn1
ns:base.braziliangovt.brazilian_political_party.president ns:m.02ql58w.
line num: 270601- ns:m.01gqn1 key:wikipedia.fr_id "742582".
line num: 270602- ns:m.01gqn1 key:wikipedia.ja_id "1747452".
line num: 270603- ns:m.01gqn1 ns:type.object.key "/wikipedia/es_id/281246".
line num: 270604- ns:m.01gqn1 ns:type.object.key
"/wikipedia/en/Brazilian_Worker$0027s_Party".
line num: 270605- ns:m.01gqn1 ns:common.topic.topical_webpage <
http://tse.gov.br/internet/partidos/partidos_politicos/pt.htm>.
line num: 270606- ns:m.01gqn1
ns:base.braziliangovt.brazilian_political_party.number 13.
line num: 270607- ns:m.01gqn1 ns:type.object.key "/en/workers_party".
line num: 270608- ns:m.01gqn1 ns:type.object.key
"/wikipedia/en/Brazilian_Workers_Party".
line num: 270609- ns:m.01gqn1 ns:type.object.type
ns:government.political_party.
line num: 270610- ns:m.01gqn1 ns:common.topic.image ns:m.03s8rh8.
line num: 270611- ns:m.01gqn1 ns:common.topic.official_website <
http://www.pt.org.br/>.
line num: 270612- ns:m.01gqn1 ns:type.object.type ns:business.employer.
line num: 270613- ns:m.01gqn1 ns:type.object.key "/wikipedia/ru_id/1230551".
line num: 270614- ns:m.01gqn1 ns:type.object.key "/wikipedia/en_id/224622".
line num: 270615- ns:m.01gqn1 ns:type.object.type
ns:base.braziliangovt.brazilian_political_party.
line num: 270616- ns:m.01gqn1 ns:type.object.name "Partido de los
Trabajadores"@es.
line num: 270617- ns:m.01gqn1 ns:type.object.key
"/wikipedia/ja/$52B4$50CD$8005$515A_$0028$30D6$30E9$30B8$30EB$0029".
line num: 270618- ns:m.01gqn1 ns:common.topic.article ns:m.01gqn9.
line num: 270619- ns:m.01gqn1 ns:common.topic.webpage ns:m.04yvgzd.
line num: 270620- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://de.wikipedia.org/wiki/Partido_dos_Trabalhadores>.
line num: 270621- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://fr.wikipedia.org/wiki/Parti_des_travailleurs_(Brésil)>.
line num: 270622- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://it.wikipedia.org/wiki/index.html?curid=616917>.
line num: 270623- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://ru.wikipedia.org/wiki/index.html?curid=1230551>.
line num: 270624- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://ja.wikipedia.org/wiki/index.html?curid=1747452>.
line num: 270625- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://de.wikipedia.org/wiki/index.html?curid=408018>.
line num: 270626- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://ru.wikipedia.org/wiki/ПÐ
°Ñ€Ñ‚иÑ?_трудÑ?щихÑ?Ñ?_(БразилиÑ?)>.
line num: 270627- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://es.wikipedia.org/wiki/Partido_de_los_Trabajadores_(Brasil)>.
line num: 270628- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://es.wikipedia.org/wiki/index.html?curid=281246>.
line num: 270629- ns:m.01gqn1 ns:common.topic.topic_equivalent_webpage <
http://it.wikipedia.org/wiki/Partito_dei_Lavoratori_(Brasile)>.

----------------------------

Thank you!

With Regards,
Abhishek S


On Fri, Dec 28, 2012 at 1:15 AM, Andy Seaborne <[email protected]> wrote:

> On 27/12/12 19:17, Abhishek Shivkumar wrote:
>
>> Thanks Andy. It runs for few lines and then throws an error like below
>> saying that the triples are not ending with a DOT. I am assuming the file
>> doesn't end every triple with a "." . Is there a work around for this?
>>
>> I read somewhere in the mailing list that this has been fixed in SVN.
>> Anyways, I downloaded the jena from 
>> http://www.apache.org/dist/**jena/<http://www.apache.org/dist/jena/>
>>
>
> 1/ the warning for es-419 is fixed.
>
> 2/
> [[
> The Bad IRI: 
> <http://*lv.wikipedia.org/**wiki/Riode<http://lv.wikipedia.org/wiki/Riode>
> ₧aneiro_"Fluminense"**> Code: 4/UNWISE_CHARACTER in PATH: The character
> matches no grammar rules of URIs/IRIs. These characters are permitted in
> RDF URI References, XML system identifiers, and XML Schema anyURIs.
> ]]
>
> Can't have " in URIs.  The ₧ looks like an ISO-8859-1/UTF-8 encoding error.
>
> 3/ The file has bad syntax - you need to look at line 270608 or so and fix
> it up.  It's a bug in the freebase data.
>
> Shame they haven't fixed it - it was wrong previously as well.  It may be
> something like unmatched quotes, or an encoding error, so it may look right
> but isn't.
>
> What are the lines around 270608?
>
>
> (Another good reason for parsing data before loading!)
>
>         Andy
>
>
>
>> Thanks much!
>>
>> *00:39:13 INFO  loader               :: Add: 150,000 triples (Batch:
>> 40,950
>> / Avg*
>> *: 23,648)*
>> *00:39:14 INFO  loader               :: Add: 200,000 triples (Batch:
>> 43,898
>> / Avg*
>> *: 26,730)*
>> *00:39:14 WARN  riot                 :: [line: 209572, col: 54] Bad IRI:
>> <http://*
>> *lv.wikipedia.org/wiki/Riode₧**aneiro_"Fluminense"> Code:
>> 4/UNWISE_CHARACTER
>> in PAT*
>> *H: The character matches no grammar rules of URIs/IRIs. These characters
>> are per*
>> *mitted in RDF URI References, XML system identifiers, and XML Schema
>> anyURIs.*
>> *00:39:14 WARN  riot                 :: [line: 219452, col: 33] Language
>> not vali*
>> *d: es-419*
>> *00:39:14 WARN  riot                 :: [line: 219560, col: 24] Language
>> not vali*
>> *d: es-419*
>> *00:39:15 INFO  loader               :: Add: 250,000 triples (Batch:
>> 43,975
>> / Avg*
>> *: 29,005)*
>> *00:39:15 ERROR riot                 :: [line: 270608, col: 1 ] Triples
>> not
>> termi*
>> *nated by DOT*
>> *Exception in thread "main" org.openjena.riot.**RiotException: [line:
>> 270608,
>> col:*
>> *1 ] Triples not terminated by DOT*
>>
>> *        at
>> org.openjena.riot.**ErrorHandlerFactory$**ErrorHandlerStd.fatal(**
>> ErrorHand*
>> *lerFactory.java:130)*
>> *        at
>> org.openjena.riot.lang.**LangEngine.raiseException(**
>> LangEngine.java:169)*
>> *
>> *
>> *        at
>> org.openjena.riot.lang.**LangEngine.exceptionDirect(**
>> LangEngine.java:162*
>> *)*
>> *        at org.openjena.riot.lang.**LangEngine.exception(**
>> LangEngine.java:155)
>> *
>> *        at org.openjena.riot.lang.**LangEngine.expect(LangEngine.**
>> java:147)*
>> *        at
>> org.openjena.riot.lang.**LangEngine.expectOrEOF(**LangEngine.java:138)*
>> *        at
>> org.openjena.riot.lang.**LangTurtle.expectEndOfTriples(**
>> LangTurtle.java:*
>> *57)*
>> *        at
>> org.openjena.riot.lang.**LangTurtleBase.triples(**
>> LangTurtleBase.java:285*
>> *)*
>> *        at
>> org.openjena.riot.lang.**LangTurtleBase.**triplesSameSubject(**
>> LangTurtleBa*
>> *se.java:223)*
>> *        at
>> org.openjena.riot.lang.**LangTurtle.oneTopLevelElement(**
>> LangTurtle.java:*
>> *46)*
>> *        at
>> org.openjena.riot.lang.**LangTurtleBase.runParser(**
>> LangTurtleBase.java:1*
>> *44)*
>> *        at org.openjena.riot.lang.**LangBase.parse(LangBase.java:**43)*
>> *        at org.openjena.riot.RiotReader.**parseTriples(RiotReader.java:*
>> *97)*
>> *        at org.openjena.riot.RiotReader.**parseTriples(RiotReader.java:*
>> *83)*
>> *        at org.openjena.riot.RiotReader.**parseTriples(RiotReader.java:*
>> *56)*
>> *        at
>> com.hp.hpl.jena.tdb.store.**bulkloader.BulkLoader.**
>> loadTriples$(BulkLoad*
>> *er.java:139)*
>> *        at
>> com.hp.hpl.jena.tdb.store.**bulkloader.BulkLoader.**
>> loadDefaultGraph(Bulk*
>> *Loader.java:87)*
>> *        at
>> com.hp.hpl.jena.tdb.TDBLoader.**loadDefaultGraph$(TDBLoader.**java:261)*
>> *        at com.hp.hpl.jena.tdb.TDBLoader.**
>> loadGraph$(TDBLoader.java:244)***
>> *        at com.hp.hpl.jena.tdb.TDBLoader.**
>> loadGraph(TDBLoader.java:177)*
>> *        at com.hp.hpl.jena.tdb.TDBLoader.**load(TDBLoader.java:112)*
>> *        at tdb.tdbloader.**loadDefaultGraph(tdbloader.**java:150)*
>> *        at tdb.tdbloader.exec(tdbloader.**java:116)*
>>
>> *        at arq.cmdline.CmdMain.**mainMethod(CmdMain.java:101)*
>> *        at arq.cmdline.CmdMain.mainRun(**CmdMain.java:63)*
>> *        at arq.cmdline.CmdMain.mainRun(**CmdMain.java:50)*
>> *        at tdb.tdbloader.main(tdbloader.**java:53)*
>>
>> Thank you!
>>
>> With Regards,
>> Abhishek S
>>
>>
>> On Fri, Dec 28, 2012 at 12:35 AM, Andy Seaborne <[email protected]> wrote:
>>
>>  On 27/12/12 18:42, Abhishek Shivkumar wrote:
>>>
>>>  Hi,
>>>>
>>>>     I am trying to load a large (55 GB!) rdf file into JENA TDB for
>>>> sparql
>>>> querying later. Here is a snapshot of the file at the end of this email:
>>>>
>>>> When I am using TDBLoader from command line using the following command:
>>>>
>>>> *c:\JENA\apache-jena-2.7.4\****apache-jena-2.7.4\bat>****tdbloader.bat
>>>> -loc
>>>> test
>>>> "C:\freebase-rdf-2012-12-09-****00-00"*
>>>>
>>>>
>>> The TDB loader has no clue, via file extension, as to the syntax.  The
>>> default is n-quads/n-triples.
>>>
>>> But it's turtle, hence a syntax error.
>>>
>>> So either:
>>>
>>> 1/ Run "riotcmd.turtle FILE > data.nt"
>>>
>>> This is preferred because:
>>>    A/ It check the file is valid before loading.
>>>    B/ The NT loads faster.
>>>
>>> 2/ Rename the file to "something.ttl"
>>>
>>>          Andy
>>>
>>>
>>>  I get this error:
>>>>
>>>> *23:40:30 INFO  loader               :: -- Start triples data phase*
>>>> *23:40:30 INFO  loader               :: ** Load empty triples table*
>>>> *23:40:30 INFO  loader               :: -- Start quads data phase*
>>>> *23:40:30 INFO  loader               :: ** Load empty quads table*
>>>> *23:40:30 INFO  loader               :: Load: C:\Users\IBM_ADMIN\My
>>>> Documents\dow*
>>>> *n\freebase-rdf-2012-12-09-00-****00\freebase-rdf-2012-12-09-**00-**00
>>>> --
>>>>
>>>> 2012/12/27 23:4*
>>>> *0:30 IST*
>>>> *23:40:30 ERROR riot                 :: [line: 1, col: 1 ] Expected
>>>> BNode
>>>>
>>>> or IRI:*
>>>> * Got: [DIRECTIVE:prefix]*
>>>> *Exception in thread "main" org.openjena.riot.****RiotException:
>>>> [line: 1,
>>>>
>>>> col:
>>>>
>>>> 1 ] E*
>>>> *xpected BNode or IRI: Got: [DIRECTIVE:prefix]*
>>>> *        at
>>>> org.openjena.riot.****ErrorHandlerFactory$****ErrorHandlerStd.fatal(**
>>>> ErrorHand*
>>>> *lerFactory.java:130)*
>>>> *        at
>>>> org.openjena.riot.lang.****LangEngine.raiseException(**
>>>>
>>>> LangEngine.java:169)*
>>>> *
>>>> *
>>>> *        at
>>>> org.openjena.riot.lang.****LangEngine.exceptionDirect(**
>>>> LangEngine.java:162*
>>>> *)*
>>>> *        at org.openjena.riot.lang.****LangEngine.exception(**
>>>> LangEngine.java:155)
>>>> *
>>>> *        at
>>>> org.openjena.riot.lang.****LangNTuple.checkIRIOrBNode(**
>>>> LangNTuple.java:107*
>>>> *)*
>>>> *        at org.openjena.riot.lang.****LangNQuads.parseOne(**
>>>> LangNQuads.java:84)*
>>>> *        at org.openjena.riot.lang.****LangNQuads.parseOne(**
>>>> LangNQuads.java:34)*
>>>> *        at org.openjena.riot.lang.****LangNTuple.runParser(**
>>>> LangNTuple.java:69)*
>>>> *        at org.openjena.riot.lang.****LangBase.parse(LangBase.java:***
>>>> *43)*
>>>> *        at org.openjena.riot.RiotReader.***
>>>> *parseQuads(RiotReader.java:**
>>>> 134)*
>>>> *        at org.openjena.riot.RiotReader.***
>>>> *parseQuads(RiotReader.java:**
>>>> 121)*
>>>> *        at org.openjena.riot.RiotReader.***
>>>> *parseQuads(RiotReader.java:**
>>>> 107)*
>>>> *        at
>>>> com.hp.hpl.jena.tdb.store.****bulkloader.BulkLoader.**
>>>> loadQuads$(BulkLoader*
>>>> *.java:160)*
>>>> *        at
>>>> com.hp.hpl.jena.tdb.store.****bulkloader.BulkLoader.**
>>>> loadDataset(BulkLoade*
>>>> *r.java:121)*
>>>> *        at com.hp.hpl.jena.tdb.TDBLoader.**
>>>> **loadDataset$(TDBLoader.java:***
>>>> *283)*
>>>> *        at com.hp.hpl.jena.tdb.TDBLoader.**
>>>> **loadDataset(TDBLoader.java:****
>>>> 196)*
>>>> *        at com.hp.hpl.jena.tdb.TDBLoader.****load(TDBLoader.java:75)*
>>>> *        at tdb.tdbloader.loadQuads(****tdbloader.java:163)*
>>>> *        at tdb.tdbloader.exec(tdbloader.****java:122)*
>>>> *        at arq.cmdline.CmdMain.****mainMethod(CmdMain.java:101)*
>>>> *        at arq.cmdline.CmdMain.mainRun(****CmdMain.java:63)*
>>>> *        at arq.cmdline.CmdMain.mainRun(****CmdMain.java:50)*
>>>> *        at tdb.tdbloader.main(tdbloader.****java:53)*
>>>>
>>>>
>>>>
>>>> I need help in understanding this error and how to solve it. Is there a
>>>> problem with the input file?
>>>>
>>>>
>>>> @prefix ns: <http://rdf.freebase.com/ns/>.
>>>> @prefix key: <http://rdf.freebase.com/key/>****.
>>>> @prefix owl: 
>>>> <http://www.w3.org/2002/07/****owl#<http://www.w3.org/2002/07/**owl#>
>>>> <http://www.w3.org/2002/**07/owl# <http://www.w3.org/2002/07/owl#>>
>>>>
>>>>> .
>>>>>
>>>> @prefix rdfs: 
>>>> <http://www.w3.org/2000/01/****rdf-schema#<http://www.w3.org/2000/01/**rdf-schema#>
>>>> <http://www.w3.org/**2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>
>>>> >
>>>>
>>>>> .
>>>>>
>>>> @prefix xsd: 
>>>> <http://www.w3.org/2001/****XMLSchema#<http://www.w3.org/2001/**XMLSchema#>
>>>> <http://www.w3.org/**2001/XMLSchema#<http://www.w3.org/2001/XMLSchema#>
>>>> >
>>>>
>>>>  .
>>>>>
>>>>
>>>> ns:m.012rkqx    ns:type.object.type     ns:common.topic.
>>>> ns:m.012rkqx    ns:type.object.name     "High Fidelity"@en.
>>>> ns:m.012rkqx    ns:type.object.type     ns:music.single.
>>>> ns:m.012rkqx    ns:type.object.key      ns:authority.musicbrainz.name.*
>>>> ***
>>>>
>>>> TRACK3987054.
>>>> ns:m.012rkqx    ns:type.object.type     ns:music.recording.
>>>> ns:m.012rkqx    key:authority.musicbrainz       "258c45bd-
>>>> 4437-4580-8988
>>>> -b3f3be975f9c".
>>>> ns:m.012rkqx    key:authority.musicbrainz.name  "TRACK3987054".
>>>> ns:m.012rkqx    rdf:label       "High Fidelity"@en.
>>>> ns:m.012rkqx    rdf:type        ns:common.topic.
>>>> ns:m.012rkqx    rdf:type        ns:music.single.
>>>> ns:m.012rkqx    rdf:type        ns:music.recording.
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to