Re: [Neo4j] Modelling with neo4j

Peter Neubauer Sat, 24 Sep 2011 06:52:31 -0700

Great thoughts guys!
I think it would be interesting to break out the Enhanced API" from
graph-collections, rename it into something better (we can think of a
name together) and provide a more fully fledged example that we can
document and evolve.


WDYT?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta
<rick.bullo...@thingworx.com> wrote:
> That's a great summary, Niels.  Very similar to how we've applied Neo4J here 
> at ThingWorx, though we've done most of the type system work (nodes and 
> relationships are all typed/subtyped) in our application domain layer.  A few 
> other items that we leveraged in our implementation that you may wish to 
> consider:
>
> - A common pattern we encountered was a "collection" of typed entities (e.g. 
> a typed collection), and we implemented a specific model using supernodes for 
> this.  This also allowed us to rapidly and easily iterate/search collections 
> and also to organize nodes in a "human comprehensible way" that can be 
> readily viewed with something like Neoclipse for troubleshooting purposes.  
> Also, if the type was "truck", we stamped the node with the type "truck" as a 
> property (using enumerations with a custom int member) and used that same 
> enum as the relationship type between the node and the collection node.  In 
> our model, an entity has a single "type", but we implemented the concept of 
> supertyping/subtyping in our domain model
>
> - We found quite a few examples where a "one-way relationship" was more than 
> adequate and, instead of incurring the overhead of a relationship 
> (particularly when millions of these relationships were attached to a single 
> supernode), we used a *property* on a node containing the node id of the node 
> it references.  Sounds like a hack, but it actually has substantial 
> performance advantages, particularly if you are frequently adding/removing 
> relationships to/from the supernode
>
> - We overlaid our own REST API on our domain model, and wanted to come up 
> with a simple way to resolve the URI for any given node/entity.  For that, we 
> used a pattern for which each node can have an optional "parent" node type.  
> Example:  a blog comment is always attached to a blog entry or other blog 
> comment.  A blog entry is always attached to a blog.  A blog is always 
> attached to the blogs collection, and so on.  Each node has a name and/or an 
> ID.  Because those relationship "patterns" are well known, it is a trival 
> matter to create the URI to any entity given only its node, e.g.:
>
> /Blogs/MyBlog/Entries/103/Comments/204
>
> Of course, it works the other way as well - easy to parse and traverse.
>
> - We often found that there were data structures in our application domain 
> for which it was OK to be "opaque" - e.g. although the structures were deep 
> and complex, they did not require searchability or traversability (e.g. they 
> were kind like "object blobs"), so in our metamodel, they are not stored as 
> nodes, relationships, and properties, but rather, as a JSON blob, serialized 
> as a string to a node property.  That has worked out really well.  When we do 
> need to filter/manipulate those, we do them at the domain level
>
> Just wanted to share some more examples.
>
> Rick
>
> ________________________________________
> From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf 
> Of Niels Hoogeveen [pd_aficion...@hotmail.com]
> Sent: Saturday, September 24, 2011 9:14 AM
> To: user@lists.neo4j.org
> Subject: Re: [Neo4j] Modelling with neo4j
>
> You raise interesting questions, most of them very much related to the work I 
> did on Enhanced API.
>
> Let me start with the distinction between Node and Relationship, which in my 
> opinion too is a bit artificial. I understand when creating a graph database, 
> it is helpful to have something like vertices and edges, but indeed see those 
> more as modalities of the elements of the graph than as clearly separated 
> types. This was one of the reasons to unify all elements of the graph with 
> one underlying type.
>
> At the time, I saw two option:
>
> a) make the graph bipartite, so that all relationships and properties become 
> nodes and use relationships only as a hidden linking feature
> b) create shadow nodes for relationships and properties when needed and let 
> the API handle that transparently
>
> I chose for option b for performance reasons. There are likely many 
> applications where most of the relationships are simple, ie. link two nodes 
> while possibly having some properties. Using a bipartite layout for such 
> relationships adds nothing, but it takes twice as many links to traverse.
>
> The shadow node solution only treats relationships and properties as special 
> (having relationships to them) when that is needed.
>
> Now to the typing issues. Neo4j has chosen not to add typing features to the 
> database and I actually like that. It allows for optional type systems that 
> can be used but are not enforced to be used.
>
> Type systems are nice beasts, especially when dealing with large and complex 
> applications, but they impose a development overhead, mostly felt in small 
> quick and dirty applications. This is true for programming languages, where 
> many people prefer to use an untyped language such as Javascript, Python, 
> Ruby and PHP over a typed language such as Java, Scala, C# or Haskell and I 
> think it is also true for databases. I think one of the reasons NOSQL became 
> so popular is because the type system of an RDBMS adds overhead to simple 
> applications.
>
> An RDBMS needs a type system because the storage layout requires that. Tables 
> have a fixed number of columns, where each column has a designated type. 
> While this is a great feature when processing massive amounts of similar 
> data, it can also make the application brittle. The tight coupling between 
> type system and storage layout makes that rapid schema evolution is not easy 
> to do.
>
> Neo4j doesn't impose a type system like an RDBMS does, because its storage 
> layout doesn't require it. Something is either a node, a relationship or a 
> property, but the combinations don't need to explicit modelling for the sake 
> of storage.
>
> Because of this untyped nature of the database, it now becomes possible to 
> add a type system that not only is optional, but can in fact be made as 
> strong or as weak as the application demands.
>
> Unfortunately Neo4j doesn't provide all the necessary hooks for a type 
> system, another reason why I started Enhanced API. It was not my intention 
> with that API to provide a full fledged type system to Neo4j, but to provide 
> the necessary hooks so a type system can be created.
>
> Of course there is some type-creep in Neo4j. Properties and relationships 
> have names, which in almost every application are used as types. Say we have 
> several nodes we like to use to store information about people, where each of 
> those nodes has a property "last_name". This property name effectively is 
> used as a type. For all nodes the property name will denote the same fact: 
> the last name of a person.
>
> This is not necessarily required by the Neo4j database. Different nodes may 
> use the same property name to denote different things even with different 
> datatypes. It is possible to have nodes with property name "last_name" that 
> for some nodes is a String while it is an Integer for other nodes. While this 
> is possible, I venture this is not all that common. The same property name 
> will likely be used to denote the same fact and have the same datatype across 
> the graph and therefore in most common cases be used like a type.
>
> The same applies to relationships, where the name will in general be used to 
> denote the same type of relationship. It is unlikely an application with use 
> the "FRIEND" relationship to sometimes denote a friendship between two people 
> while at other times use that relationship name to denote the address of a 
> building.
>
> This is as far as typing goes in Neo4j, but it is there and means we have to 
> incorporate it into the API somehow.
>
> This is the reason why I decided to add subtyping of relationship-types and 
> property-types in the API, a feature that may be of interest to the model you 
> describe in your email.
>
> Joe is a janitor at the school.
>
> Here we see three elements: "Joe", "is janitor at", and "the school", which 
> can indeed be modeled with two nodes and a relationship.
>
> There is however a more general statement here of the form: person works with 
> organization. Suppose we want to store the fact:"
>
> Jane is principal of the school. Again we can model this with two nodes and a 
> relationship.
>
> The standard API offers no features to ask the question "who works at the 
> school?".
>
> Subtyping of relationship-types helps for these cases. We can create a 
> relationship-type "WORKS_FOR" and state it has the subtypes "JANITOR_AT" and 
> ""PRINCIPAL_OF".
>
> Now we can ask for all nodes that have a "WORKS_FOR" relationship and not 
> only those relationships that are directly stored as "WORKS_FOR" are being 
> returned, but also those relationships that are stored as "JANITOR_AT" and as 
> "PRINCIPAL_OF".
>
> Now to your original question: how to store general information about 
> janitors. Enhanced API reifies RelationshipType as a Node. With that a 
> RelationshipType is no longer just a name, but becomes something that can 
> take additional information (both in the form of properties and in the form 
> of relationships).
>
> It is up to the application programmer how to use this relationshiptype-node, 
> the API only gives you a node, and with that the hooks to use and abuse it to 
> your heart's content.
>
> Finally, I see two of the layers you describe. For some applications it is 
> necessary to have something akin to a  type system, and Enhanced API offers 
> you the hooks to create one. I don't immediately see what the purpose is of 
> the interface layer. Could you expand on that feature a bit more, so I may 
> help you figure out how to tackle that issue.
>
> Niels
>
>
>
>> Date: Fri, 23 Sep 2011 22:52:14 -0700
>> From: lold...@gmail.com
>> To: user@lists.neo4j.org
>> Subject: [Neo4j] Modelling with neo4j
>>
>> I'm trying to figure out how to model the world most flexibly (okay, so I'm
>> sticking to modelling organisations for now, but still). My main problem
>> seems to occur when I want to allow the model to naturally expand in
>> complexity. Say we have the following relationship:
>>
>> Joe is a janitor at the school.
>>
>> This can easily be modelled with two entities and a relationship. Now say I
>> have some common properties for janitors. I would have to make a link from
>> the janitor-relation to some node denoting the type 'janitor' which could
>> then hold information on these common things. Unfortunately, relationships
>> doesn't support that.
>>
>> Long story short: the problem is that sometimes I want my things to act as
>> things, sometimes as types, sometimes as interfaces, and I cannot know in
>> advance which of these modalities I'm going to need.
>>
>> Therefore, I'm considering going with this model:
>>
>> Imagine a graph in three layers. The lower layer represents things, the
>> middle layer represents types and the upper layer represents interfaces.
>> Initially i populate only the lowest layer, but as need arise I go back and
>> promote various things to also be types or interfaces. These then crop up in
>> the second and third layer of the graph, respectively. When this happens, a
>> vertical relationship is added between the element in the lower layer and
>> its new type/interface in three higher layers.
>>
>> Now the question is: how to model this scheme in neo4j? A number of
>> challenges pops up:
>>
>> * Neo4j relationships cannot be n-ary, so every relationship must be
>> modelled with a hyperrelationship, thus allowing future relations to the
>> second and third layers.
>>
>> * In a modalities-are-a-changing-paradigm it doesn't really make sense to
>> distinguish between relations and entities; at different points in time, one
>> element may have to act in the roles of both. Neo4j however makes a
>> fundamental destinction between the two things. I could choose too model all
>> relationships as nodes, but will that not make graph traversals messy?
>>
>> * Neo4j doesn't come with a type strong destinction between such three
>> layers of modalityy
>>
>> --
>> View this message in context: 
>> http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3363823.html
>> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Modelling with neo4j

Reply via email to