Re: Implementing RDF reader

Martynas Jusevičius Mon, 11 May 2015 12:29:22 -0700

Thanks Andy.

I have a parser that works on String, but this time I want to do it
right and make it streaming and plug it into Jena at the low level.


It seems that I should be able to reuse some code from TokenizerText.

I understand StreamRDF is used to sink the triples, but what about
ParserProfile? I see LangTurtleBase uses it:

        org.apache.jena.iri.IRI iri = profile.makeIRI(iriStr,
currLine, currCol) ;

How do I construct an instance of ParserProfile? Or is there an
alternative way to construct IRIs etc.?

Martynas

On Mon, May 11, 2015 at 2:44 PM, Andy Seaborne <a...@apache.org> wrote:
> On 10/05/15 21:48, Martynas Jusevičius wrote:
>>
>> Hey all,
>>
>> I want to refactor my RDF/POST parser into a Jena-compatible reader.
>> An example of the format can be found here:
>> http://www.lsrn.org/semweb/rdfpost.html#sec-examples
>>
>> The documentation suggests implementing ReaderRIOT interface:
>>
>> https://github.com/apache/jena/blob/master/jena-arq/src-examples/arq/examples/riot/ExRIOT_5.java
>>
>> However, if I look at (what I think is) existing readers such as
>> Turtle for example, they do not seem to implement ReaderRIOT:
>>
>> https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/riot/lang/LangTurtleBase.java
>>
>> What is the explanation for that?
>
>
> Hi Martynas,
>
> It is historical - the Turtle derived parsers emerged with the RiotReader
> interface and some code is/was around that used that interface.
>
> ReaderRIOTLang is the cross-over code from the proper interface ReaderRIOT
> to RiotReader. RiotReader is a fixed set of parsers.
>
> This can be sorted out in Jena3.
>
>>
>> Do I need to to tokenize the InputStream myself or is there some
>> machinery I can reuse?
>
>
> The Turtle-world tokenizer is TokenizerText.  It is turtle term specific.
>
> Any tokenizing for a new language is often, in my experience, very sensitive
> to the language details.
>
> If you are used to javacc, and performance isn't critical at scale, that's a
> good tool.
>
> RIOT uses custom I/O for speed; Jena used to have a javacc parser for Turtle
> but Turtle is sufficiently simple that a hand-written parser is doable.  A
> hand written tokenizer is for speed at scale (big file - about x2 than basic
> javacc tokenizing) but you need large input to make it worthwhile.  NTriples
> dumps of databases make it worthwhile.
>
> If you do rdfpost -> Turtle (string manipulation), then you can parse the
> Turtle as normal.  Downside: Error messages may be confusing as they refer
> to the Turtle, not the input string.
>
> Splitting up the query string, with all the HTTP escaping rules, can be done
> with library code (see FusekiLib.parseQueryString [no longer used, but it
> works without consuming the body, unlike the servlet operations which
> combine form and query string processing] and probably lots of better code
> examples on the web.
>
>         Andy
>>
>>
>> Martynas
>> graphityhq.com
>>
>

Re: Implementing RDF reader

Reply via email to