It is nice that the Titan guys see RDF as something to compare to. Coincidently, I was giving a talk about Property Graph / Linked Data just recently at the European ApacheCon BigData conference.


The Property Graph (PG) market is maybe x2 the size of the RDF market, and both are small. The challenge is growing the graph market, not one form taking market share away from the other.

And the key difference between graph databases (either kind) and other data systems is the approach to data modelling. The differences between graph systems are not the key here.

About reification, they are somewhat off-track. Reification is a quite specialised feature for limited use. It is not RDF's equivalent to attributes on links in PG.

Let me make that concrete with an example simplified from Graph databases / chapter 3 (page 52 in my copy). The book is written the Neo4J folks.

Email provenance.

    A sends_email_to B

Now, you could reify that statement (the act by A of sending the email to B).

Reification is way more powerful than just being about to add data to the triple. It says "claim: A sends_mail_to B" - several different and competing claims can be made. But let's continue assuming reification and assertion of the triple ... [*]

<<A sends email to B>>
    cc C
    cc D
    sentOn Tuesday

In the same modelling way you could add attributes to a PG graph edge for sends_email_to.

Both PG and RDF modelling here are anti-patterns (as chapter 3 notes for PG).

The email sent is an important concept so model it explicitly:

A   sends       MSG
MSG receivedBy  B
MSG cc_to       C
MSG cc_to       D
MSG sentOn      "Tuesday"

By modelling the email message as a first class concept, not implicit in the activity via reification/link attributes, you can better add information e.g. which servers it was transferred by and stored on, when was it received (this is email - that might be twice) and better query it (who else accessed it on receipt). Modelling those on the act of sending is making life hard (how do you talk about a draft email?)

MSG contents        URL_to_content
MSG hasChecksum     0xABCDEF
MSG status          :sent

This event based modelling.


If you wanted a highly efficient reification-supporting RDF store, then build one. No need to blindly store as multiple triples (its called compression!). You don't see such stores because reification is a minor feature of RDF. Event-based modelling and named graphs are often better.

    Andy

[*]
<< >> is syntax that I proposed in early SPARQL drafts pre 1.0 for reification support but didn't gain much support. It is still in the ARQ parser source but not active.


Reply via email to