Re: Performance Cost of Reification

Andy Seaborne Fri, 09 Oct 2015 07:21:19 -0700

It is nice that the Titan guys see RDF as something to compare to.Coincidently, I was giving a talk about Property Graph / Linked Datajust recently at the European ApacheCon BigData conference.

The Property Graph (PG) market is maybe x2 the size of the RDF market,and both are small. The challenge is growing the graph market, not oneform taking market share away from the other.

And the key difference between graph databases (either kind) and otherdata systems is the approach to data modelling. The differences betweengraph systems are not the key here.

About reification, they are somewhat off-track. Reification is a quitespecialised feature for limited use. It is not RDF's equivalent toattributes on links in PG.

Let me make that concrete with an example simplified from Graphdatabases / chapter 3 (page 52 in my copy). The book is written theNeo4J folks.


Email provenance.

    A sends_email_to B

Now, you could reify that statement (the act by A of sending the emailto B).

Reification is way more powerful than just being about to add data tothe triple. It says "claim: A sends_mail_to B" - several different andcompeting claims can be made. But let's continue assuming reificationand assertion of the triple ... [*]


<<A sends email to B>>
    cc C
    cc D
    sentOn Tuesday

In the same modelling way you could add attributes to a PG graph edgefor sends_email_to.

Both PG and RDF modelling here are anti-patterns (as chapter 3 notes forPG).


The email sent is an important concept so model it explicitly:

A   sends       MSG
MSG receivedBy  B
MSG cc_to       C
MSG cc_to       D
MSG sentOn      "Tuesday"

By modelling the email message as a first class concept, not implicit inthe activity via reification/link attributes, you can better addinformation e.g. which servers it was transferred by and stored on, whenwas it received (this is email - that might be twice) and better queryit (who else accessed it on receipt). Modelling those on the act ofsending is making life hard (how do you talk about a draft email?)


MSG contents        URL_to_content
MSG hasChecksum     0xABCDEF
MSG status          :sent

This event based modelling.

If you wanted a highly efficient reification-supporting RDF store, thenbuild one. No need to blindly store as multiple triples (its calledcompression!). You don't see such stores because reification is a minorfeature of RDF. Event-based modelling and named graphs are often better.


    Andy

[*]

<< >> is syntax that I proposed in early SPARQL drafts pre 1.0 forreification support but didn't gain much support. It is still in the ARQparser source but not active.

Re: Performance Cost of Reification

Reply via email to