Hello,
I face the problem that i have large ntriple-files which are containing
corrupt triples. They should be imported into a tdb database but the
importer allways aborts because of invalid iri.
I suspect the best way to handle this would be a "pre-validation" and
excluding the invalid triples. Is there a script which can do this or
maybe a simple mechanism in the jena-api?
The invalid triples look like and shoul be excluded:
<http://res_id.de> <http://prop.de> <http://a.de <c,d>>.
The exception which aborts the import algorithm is:
Exception in thread "main" org.openjena.riot.RiotException: [line:
8766228, col: 89] Broken IRI (bad character: '<'):
http://www.kirchen.net/portal/pfarre.asp?Iid=%7BADBE0FCB-F59B-4388-BAA3-8E22D450AFB6%7DPfarre
at
org.openjena.riot.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:130)
at
org.openjena.riot.lang.LangEngine.raiseException(LangEngine.java:169)
at org.openjena.riot.lang.LangEngine.nextToken(LangEngine.java:116)
at org.openjena.riot.lang.LangNTriples.parseOne(LangNTriples.java:57)
at org.openjena.riot.lang.LangNTriples.parseOne(LangNTriples.java:33)
at org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:69)
at org.openjena.riot.lang.LangBase.parse(LangBase.java:43)
at org.openjena.riot.RiotReader.parseTriples(RiotReader.java:97)
at org.openjena.riot.RiotReader.parseTriples(RiotReader.java:83)
at org.openjena.riot.RiotReader.parseTriples(RiotReader.java:56)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadTriples$(BulkLoader.java:139)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadNamedGraph(BulkLoader.java:107)
at com.hp.hpl.jena.tdb.TDBLoader.loadNamedGraph$(TDBLoader.java:271)
at com.hp.hpl.jena.tdb.TDBLoader.loadGraph$(TDBLoader.java:246)
at com.hp.hpl.jena.tdb.TDBLoader.loadGraph(TDBLoader.java:177)
at com.hp.hpl.jena.tdb.TDBLoader.load(TDBLoader.java:112)
at tdb.tdbloader.loadNamedGraph(tdbloader.java:157)
at tdb.tdbloader.exec(tdbloader.java:142)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at tdb.tdbloader.main(tdbloader.java:53)
Thanks in advance
Stefan Scheffler