Re: [Neo4j] Modelling with neo4j

Niels Hoogeveen Sat, 24 Sep 2011 06:15:00 -0700

You raise interesting questions, most of them very much related to the work I 
did on Enhanced API.


Let me start with the distinction between Node and Relationship, which in my 
opinion too is a bit artificial. I understand when creating a graph database, 
it is helpful to have something like vertices and edges, but indeed see those 
more as modalities of the elements of the graph than as clearly separated 
types. This was one of the reasons to unify all elements of the graph with one 
underlying type.

At the time, I saw two option: 

a) make the graph bipartite, so that all relationships and properties become 
nodes and use relationships only as a hidden linking feature
b) create shadow nodes for relationships and properties when needed and let the 
API handle that transparently

I chose for option b for performance reasons. There are likely many 
applications where most of the relationships are simple, ie. link two nodes 
while possibly having some properties. Using a bipartite layout for such 
relationships adds nothing, but it takes twice as many links to traverse.

The shadow node solution only treats relationships and properties as special 
(having relationships to them) when that is needed. 

Now to the typing issues. Neo4j has chosen not to add typing features to the 
database and I actually like that. It allows for optional type systems that can 
be used but are not enforced to be used. 

Type systems are nice beasts, especially when dealing with large and complex 
applications, but they impose a development overhead, mostly felt in small 
quick and dirty applications. This is true for programming languages, where 
many people prefer to use an untyped language such as Javascript, Python, Ruby 
and PHP over a typed language such as Java, Scala, C# or Haskell and I think it 
is also true for databases. I think one of the reasons NOSQL became so popular 
is because the type system of an RDBMS adds overhead to simple applications. 

An RDBMS needs a type system because the storage layout requires that. Tables 
have a fixed number of columns, where each column has a designated type. While 
this is a great feature when processing massive amounts of similar data, it can 
also make the application brittle. The tight coupling between type system and 
storage layout makes that rapid schema evolution is not easy to do.

Neo4j doesn't impose a type system like an RDBMS does, because its storage 
layout doesn't require it. Something is either a node, a relationship or a 
property, but the combinations don't need to explicit modelling for the sake of 
storage.

Because of this untyped nature of the database, it now becomes possible to add 
a type system that not only is optional, but can in fact be made as strong or 
as weak as the application demands.

Unfortunately Neo4j doesn't provide all the necessary hooks for a type system, 
another reason why I started Enhanced API. It was not my intention with that 
API to provide a full fledged type system to Neo4j, but to provide the 
necessary hooks so a type system can be created.

Of course there is some type-creep in Neo4j. Properties and relationships have 
names, which in almost every application are used as types. Say we have several 
nodes we like to use to store information about people, where each of those 
nodes has a property "last_name". This property name effectively is used as a 
type. For all nodes the property name will denote the same fact: the last name 
of a person. 

This is not necessarily required by the Neo4j database. Different nodes may use 
the same property name to denote different things even with different 
datatypes. It is possible to have nodes with property name "last_name" that for 
some nodes is a String while it is an Integer for other nodes. While this is 
possible, I venture this is not all that common. The same property name will 
likely be used to denote the same fact and have the same datatype across the 
graph and therefore in most common cases be used like a type. 

The same applies to relationships, where the name will in general be used to 
denote the same type of relationship. It is unlikely an application with use 
the "FRIEND" relationship to sometimes denote a friendship between two people 
while at other times use that relationship name to denote the address of a 
building.

This is as far as typing goes in Neo4j, but it is there and means we have to 
incorporate it into the API somehow. 

This is the reason why I decided to add subtyping of relationship-types and 
property-types in the API, a feature that may be of interest to the model you 
describe in your email.

Joe is a janitor at the school.

Here we see three elements: "Joe", "is janitor at", and "the school", which can 
indeed be modeled with two nodes and a relationship.

There is however a more general statement here of the form: person works with 
organization. Suppose we want to store the fact:"

Jane is principal of the school. Again we can model this with two nodes and a 
relationship.

The standard API offers no features to ask the question "who works at the 
school?".

Subtyping of relationship-types helps for these cases. We can create a 
relationship-type "WORKS_FOR" and state it has the subtypes "JANITOR_AT" and 
""PRINCIPAL_OF".

Now we can ask for all nodes that have a "WORKS_FOR" relationship and not only 
those relationships that are directly stored as "WORKS_FOR" are being returned, 
but also those relationships that are stored as "JANITOR_AT" and as 
"PRINCIPAL_OF".

Now to your original question: how to store general information about janitors. 
Enhanced API reifies RelationshipType as a Node. With that a RelationshipType 
is no longer just a name, but becomes something that can take additional 
information (both in the form of properties and in the form of relationships).

It is up to the application programmer how to use this relationshiptype-node, 
the API only gives you a node, and with that the hooks to use and abuse it to 
your heart's content.

Finally, I see two of the layers you describe. For some applications it is 
necessary to have something akin to a  type system, and Enhanced API offers you 
the hooks to create one. I don't immediately see what the purpose is of the 
interface layer. Could you expand on that feature a bit more, so I may help you 
figure out how to tackle that issue.

Niels



> Date: Fri, 23 Sep 2011 22:52:14 -0700
> From: lold...@gmail.com
> To: user@lists.neo4j.org
> Subject: [Neo4j] Modelling with neo4j
> 
> I'm trying to figure out how to model the world most flexibly (okay, so I'm
> sticking to modelling organisations for now, but still). My main problem
> seems to occur when I want to allow the model to naturally expand in
> complexity. Say we have the following relationship:
> 
> Joe is a janitor at the school.
> 
> This can easily be modelled with two entities and a relationship. Now say I
> have some common properties for janitors. I would have to make a link from
> the janitor-relation to some node denoting the type 'janitor' which could
> then hold information on these common things. Unfortunately, relationships
> doesn't support that.
> 
> Long story short: the problem is that sometimes I want my things to act as
> things, sometimes as types, sometimes as interfaces, and I cannot know in
> advance which of these modalities I'm going to need.
> 
> Therefore, I'm considering going with this model:
> 
> Imagine a graph in three layers. The lower layer represents things, the
> middle layer represents types and the upper layer represents interfaces.
> Initially i populate only the lowest layer, but as need arise I go back and
> promote various things to also be types or interfaces. These then crop up in
> the second and third layer of the graph, respectively. When this happens, a
> vertical relationship is added between the element in the lower layer and
> its new type/interface in three higher layers.
> 
> Now the question is: how to model this scheme in neo4j? A number of
> challenges pops up:
> 
> * Neo4j relationships cannot be n-ary, so every relationship must be
> modelled with a hyperrelationship, thus allowing future relations to the
> second and third layers.
> 
> * In a modalities-are-a-changing-paradigm it doesn't really make sense to
> distinguish between relations and entities; at different points in time, one
> element may have to act in the roles of both. Neo4j however makes a
> fundamental destinction between the two things. I could choose too model all
> relationships as nodes, but will that not make graph traversals messy?
> 
> * Neo4j doesn't come with a type strong destinction between such three
> layers of modalityy
> 
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3363823.html
> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
                                          
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Modelling with neo4j

Reply via email to