On 22/10/13 11:50, Philippe Genoud wrote:
I've got some trouble when attempting to read rdf data from freebase.
The following porgram
public class ExampleFreeBase {
static public void main(String... argv) {
try {
Model fbModel = ModelFactory.createDefaultModel();
fbModel.read("http://rdf.freebase.com/rdf/en/en/phillip_glasser","TURTLE");
fbModel.write(System.out, "TURTLE");
} catch (Exception e) {
e.printStackTrace();
}
}
}
fails to execute. The exception is
org.apache.jena.riot.RiotException: [line: 12, col: 179] illegal escape
sequence value: x (0x78)
at
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)
at
org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:163)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:106)
....
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:113)
at
org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:77)
at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:247)
at test.ExampleFreeBase.main(ExampleFreeBase.java:11)
line 12 of the rdf data is
ns:common.topic.description "Phillip Glasser \u2013
ameryka\u0144ski .... film\xf3w animowany ....""@pl;
so if there is no problems with the unicode escaped characters (for
example here \u2013) , riot seems to not support characters escaped in
hexadecimal (here \xfr3w)
\xf3w is not a legal Turtle escape sequence. Unicode escape are \u and
\U; other characters are a few like \" and \n.
RIOT does not accept it.
Not sure what \xf3w is trying to be -- 3F is '?' so does not need
escaping. Maybe it is a degraded Unicode replacement character.
Any suggestion to fix that problem ?
You need to fix the data - I have found it is usually necessary to fixup
Freebase to make it legal Turtle. I used perl to fix the text.
The Freebase releases don't seem to check the Turtle output for strict
legality, maybe because they reply on the conversion process. They
appreciate feedback and will make corrections for their next revision.
Andy
Thanks
Philippe