Re: [Neo4j] Modelling with neo4j

Niels Hoogeveen Sat, 24 Sep 2011 07:29:00 -0700
+1
Enhanced API grew out of a couple of classes I added to make 
IndexedRelationship work more easily (not exposing comparators), but it is 
essentially a separate component. Giving it that status would help other's 
improve it. Having laid some of the ground work, I feel it needs other people's 
input too. As it stands now, it is very much a one-man's work and while I am 
confident it contains plenty of good ideas, it can only grow with the input of 
other developers, just like IndexedRelationships has become much better thanks 
to the work Bryce put into it, and the work of others to include 
graph-collections with structures I would not even have thought about.
There is however one thing we need to look at. Right now IndexRelationships has 
a dependency on Enhanced API for the indexing of nodes based on a property. At 
the same time Enhanced API has a dependency on graph-collections, transparently 
supporting IndexedRelationships in the API.
I think it would be best to remove the dependency of graph-collections on 
enhanced-api and only offer the slightly more complex option where the user 
needs to provide a comparator. The other dependency can remain and in fact can 
even be made stronger. Enhanced API could in principle be made to support any 
type of collection, now that Bryce has added a generic nodecollection interface.
I agree "enhanced api" is not a great name, it says what it does, but certainly 
has little appeal. So I will be happy if someone can come up with something 
sexier.
Niels
> From: peter.neuba...@neotechnology.com
> Date: Sat, 24 Sep 2011 15:42:13 +0200
> To: user@lists.neo4j.org
> Subject: Re: [Neo4j] Modelling with neo4j
> 
> Great thoughts guys!
> I think it would be interesting to break out the Enhanced API" from
> graph-collections, rename it into something better (we can think of a
> name together) and provide a more fully fledged example that we can
> document and evolve.
> 
> WDYT?
> 
> Cheers,
> 
> /peter neubauer
> 
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
> 
> http://www.neo4j.org               - Your high performance graph database.
> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
> 
> 
> 
> On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta
> <rick.bullo...@thingworx.com> wrote:
> > That's a great summary, Niels.  Very similar to how we've applied Neo4J 
> > here at ThingWorx, though we've done most of the type system work (nodes 
> > and relationships are all typed/subtyped) in our application domain layer.  
> > A few other items that we leveraged in our implementation that you may wish 
> > to consider:
> >
> > - A common pattern we encountered was a "collection" of typed entities 
> > (e.g. a typed collection), and we implemented a specific model using 
> > supernodes for this.  This also allowed us to rapidly and easily 
> > iterate/search collections and also to organize nodes in a "human 
> > comprehensible way" that can be readily viewed with something like 
> > Neoclipse for troubleshooting purposes.  Also, if the type was "truck", we 
> > stamped the node with the type "truck" as a property (using enumerations 
> > with a custom int member) and used that same enum as the relationship type 
> > between the node and the collection node.  In our model, an entity has a 
> > single "type", but we implemented the concept of supertyping/subtyping in 
> > our domain model
> >
> > - We found quite a few examples where a "one-way relationship" was more 
> > than adequate and, instead of incurring the overhead of a relationship 
> > (particularly when millions of these relationships were attached to a 
> > single supernode), we used a *property* on a node containing the node id of 
> > the node it references.  Sounds like a hack, but it actually has 
> > substantial performance advantages, particularly if you are frequently 
> > adding/removing relationships to/from the supernode
> >
> > - We overlaid our own REST API on our domain model, and wanted to come up 
> > with a simple way to resolve the URI for any given node/entity.  For that, 
> > we used a pattern for which each node can have an optional "parent" node 
> > type.  Example:  a blog comment is always attached to a blog entry or other 
> > blog comment.  A blog entry is always attached to a blog.  A blog is always 
> > attached to the blogs collection, and so on.  Each node has a name and/or 
> > an ID.  Because those relationship "patterns" are well known, it is a 
> > trival matter to create the URI to any entity given only its node, e.g.:
> >
> > /Blogs/MyBlog/Entries/103/Comments/204
> >
> > Of course, it works the other way as well - easy to parse and traverse.
> >
> > - We often found that there were data structures in our application domain 
> > for which it was OK to be "opaque" - e.g. although the structures were deep 
> > and complex, they did not require searchability or traversability (e.g. 
> > they were kind like "object blobs"), so in our metamodel, they are not 
> > stored as nodes, relationships, and properties, but rather, as a JSON blob, 
> > serialized as a string to a node property.  That has worked out really 
> > well.  When we do need to filter/manipulate those, we do them at the domain 
> > level
> >
> > Just wanted to share some more examples.
> >
> > Rick
> >
> > ________________________________________
> > From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf 
> > Of Niels Hoogeveen [pd_aficion...@hotmail.com]
> > Sent: Saturday, September 24, 2011 9:14 AM
> > To: user@lists.neo4j.org
> > Subject: Re: [Neo4j] Modelling with neo4j
> >
> > You raise interesting questions, most of them very much related to the work 
> > I did on Enhanced API.
> >
> > Let me start with the distinction between Node and Relationship, which in 
> > my opinion too is a bit artificial. I understand when creating a graph 
> > database, it is helpful to have something like vertices and edges, but 
> > indeed see those more as modalities of the elements of the graph than as 
> > clearly separated types. This was one of the reasons to unify all elements 
> > of the graph with one underlying type.
> >
> > At the time, I saw two option:
> >
> > a) make the graph bipartite, so that all relationships and properties 
> > become nodes and use relationships only as a hidden linking feature
> > b) create shadow nodes for relationships and properties when needed and let 
> > the API handle that transparently
> >
> > I chose for option b for performance reasons. There are likely many 
> > applications where most of the relationships are simple, ie. link two nodes 
> > while possibly having some properties. Using a bipartite layout for such 
> > relationships adds nothing, but it takes twice as many links to traverse.
> >
> > The shadow node solution only treats relationships and properties as 
> > special (having relationships to them) when that is needed.
> >
> > Now to the typing issues. Neo4j has chosen not to add typing features to 
> > the database and I actually like that. It allows for optional type systems 
> > that can be used but are not enforced to be used.
> >
> > Type systems are nice beasts, especially when dealing with large and 
> > complex applications, but they impose a development overhead, mostly felt 
> > in small quick and dirty applications. This is true for programming 
> > languages, where many people prefer to use an untyped language such as 
> > Javascript, Python, Ruby and PHP over a typed language such as Java, Scala, 
> > C# or Haskell and I think it is also true for databases. I think one of the 
> > reasons NOSQL became so popular is because the type system of an RDBMS adds 
> > overhead to simple applications.
> >
> > An RDBMS needs a type system because the storage layout requires that. 
> > Tables have a fixed number of columns, where each column has a designated 
> > type. While this is a great feature when processing massive amounts of 
> > similar data, it can also make the application brittle. The tight coupling 
> > between type system and storage layout makes that rapid schema evolution is 
> > not easy to do.
> >
> > Neo4j doesn't impose a type system like an RDBMS does, because its storage 
> > layout doesn't require it. Something is either a node, a relationship or a 
> > property, but the combinations don't need to explicit modelling for the 
> > sake of storage.
> >
> > Because of this untyped nature of the database, it now becomes possible to 
> > add a type system that not only is optional, but can in fact be made as 
> > strong or as weak as the application demands.
> >
> > Unfortunately Neo4j doesn't provide all the necessary hooks for a type 
> > system, another reason why I started Enhanced API. It was not my intention 
> > with that API to provide a full fledged type system to Neo4j, but to 
> > provide the necessary hooks so a type system can be created.
> >
> > Of course there is some type-creep in Neo4j. Properties and relationships 
> > have names, which in almost every application are used as types. Say we 
> > have several nodes we like to use to store information about people, where 
> > each of those nodes has a property "last_name". This property name 
> > effectively is used as a type. For all nodes the property name will denote 
> > the same fact: the last name of a person.
> >
> > This is not necessarily required by the Neo4j database. Different nodes may 
> > use the same property name to denote different things even with different 
> > datatypes. It is possible to have nodes with property name "last_name" that 
> > for some nodes is a String while it is an Integer for other nodes. While 
> > this is possible, I venture this is not all that common. The same property 
> > name will likely be used to denote the same fact and have the same datatype 
> > across the graph and therefore in most common cases be used like a type.
> >
> > The same applies to relationships, where the name will in general be used 
> > to denote the same type of relationship. It is unlikely an application with 
> > use the "FRIEND" relationship to sometimes denote a friendship between two 
> > people while at other times use that relationship name to denote the 
> > address of a building.
> >
> > This is as far as typing goes in Neo4j, but it is there and means we have 
> > to incorporate it into the API somehow.
> >
> > This is the reason why I decided to add subtyping of relationship-types and 
> > property-types in the API, a feature that may be of interest to the model 
> > you describe in your email.
> >
> > Joe is a janitor at the school.
> >
> > Here we see three elements: "Joe", "is janitor at", and "the school", which 
> > can indeed be modeled with two nodes and a relationship.
> >
> > There is however a more general statement here of the form: person works 
> > with organization. Suppose we want to store the fact:"
> >
> > Jane is principal of the school. Again we can model this with two nodes and 
> > a relationship.
> >
> > The standard API offers no features to ask the question "who works at the 
> > school?".
> >
> > Subtyping of relationship-types helps for these cases. We can create a 
> > relationship-type "WORKS_FOR" and state it has the subtypes "JANITOR_AT" 
> > and ""PRINCIPAL_OF".
> >
> > Now we can ask for all nodes that have a "WORKS_FOR" relationship and not 
> > only those relationships that are directly stored as "WORKS_FOR" are being 
> > returned, but also those relationships that are stored as "JANITOR_AT" and 
> > as "PRINCIPAL_OF".
> >
> > Now to your original question: how to store general information about 
> > janitors. Enhanced API reifies RelationshipType as a Node. With that a 
> > RelationshipType is no longer just a name, but becomes something that can 
> > take additional information (both in the form of properties and in the form 
> > of relationships).
> >
> > It is up to the application programmer how to use this 
> > relationshiptype-node, the API only gives you a node, and with that the 
> > hooks to use and abuse it to your heart's content.
> >
> > Finally, I see two of the layers you describe. For some applications it is 
> > necessary to have something akin to a  type system, and Enhanced API offers 
> > you the hooks to create one. I don't immediately see what the purpose is of 
> > the interface layer. Could you expand on that feature a bit more, so I may 
> > help you figure out how to tackle that issue.
> >
> > Niels
> >
> >
> >
> >> Date: Fri, 23 Sep 2011 22:52:14 -0700
> >> From: lold...@gmail.com
> >> To: user@lists.neo4j.org
> >> Subject: [Neo4j] Modelling with neo4j
> >>
> >> I'm trying to figure out how to model the world most flexibly (okay, so I'm
> >> sticking to modelling organisations for now, but still). My main problem
> >> seems to occur when I want to allow the model to naturally expand in
> >> complexity. Say we have the following relationship:
> >>
> >> Joe is a janitor at the school.
> >>
> >> This can easily be modelled with two entities and a relationship. Now say I
> >> have some common properties for janitors. I would have to make a link from
> >> the janitor-relation to some node denoting the type 'janitor' which could
> >> then hold information on these common things. Unfortunately, relationships
> >> doesn't support that.
> >>
> >> Long story short: the problem is that sometimes I want my things to act as
> >> things, sometimes as types, sometimes as interfaces, and I cannot know in
> >> advance which of these modalities I'm going to need.
> >>
> >> Therefore, I'm considering going with this model:
> >>
> >> Imagine a graph in three layers. The lower layer represents things, the
> >> middle layer represents types and the upper layer represents interfaces.
> >> Initially i populate only the lowest layer, but as need arise I go back and
> >> promote various things to also be types or interfaces. These then crop up 
> >> in
> >> the second and third layer of the graph, respectively. When this happens, a
> >> vertical relationship is added between the element in the lower layer and
> >> its new type/interface in three higher layers.
> >>
> >> Now the question is: how to model this scheme in neo4j? A number of
> >> challenges pops up:
> >>
> >> * Neo4j relationships cannot be n-ary, so every relationship must be
> >> modelled with a hyperrelationship, thus allowing future relations to the
> >> second and third layers.
> >>
> >> * In a modalities-are-a-changing-paradigm it doesn't really make sense to
> >> distinguish between relations and entities; at different points in time, 
> >> one
> >> element may have to act in the roles of both. Neo4j however makes a
> >> fundamental destinction between the two things. I could choose too model 
> >> all
> >> relationships as nodes, but will that not make graph traversals messy?
> >>
> >> * Neo4j doesn't come with a type strong destinction between such three
> >> layers of modalityy
> >>
> >> --
> >> View this message in context: 
> >> http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3363823.html
> >> Sent from the Neo4j Community Discussions mailing list archive at 
> >> Nabble.com.
> >> _______________________________________________
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
                                          
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Modelling with neo4j

Reply via email to