Re: validation of large ntriple files

Andy Seaborne Wed, 20 Feb 2013 05:39:06 -0800

Please don't reply messages about other issues - it makes them hard to find.


On 20/02/13 07:16, Stefan Scheffler wrote:

Hello,

> I face the problem that i have large ntriple-files which are

containing corrupt triples. They should be imported into a tdb
database but the importer allways aborts because of invalid iri. I
suspect the best way to handle this would be a "pre-validation" and
excluding the invalid triples. Is there a script which can do this
or  maybe a simple mechanism in the jena-api?


Yes prevalidation is the way to go.


> The invalid triples look like and shoul be excluded:
> <http://res_id.de> <http://prop.de> <http://a.de <c,d>>.

I use perl to fix up files.

You need to decide what to do - %encode, reject, or whatever.

> The exception which aborts the import algorithm is:
> Exception in thread "main" org.openjena.riot.RiotException: [line:
> 8766228, col: 89] Broken IRI (bad character: '<'):

You can run the parser separately (in checking mode) with

riot --validate NTFILE.nt

        Andy

Re: validation of large ntriple files

Reply via email to