That's a great summary, Niels. Very similar to how we've applied Neo4J here at ThingWorx, though we've done most of the type system work (nodes and relationships are all typed/subtyped) in our application domain layer. A few other items that we leveraged in our implementation that you may wish to consider:
- A common pattern we encountered was a "collection" of typed entities (e.g. a typed collection), and we implemented a specific model using supernodes for this. This also allowed us to rapidly and easily iterate/search collections and also to organize nodes in a "human comprehensible way" that can be readily viewed with something like Neoclipse for troubleshooting purposes. Also, if the type was "truck", we stamped the node with the type "truck" as a property (using enumerations with a custom int member) and used that same enum as the relationship type between the node and the collection node. In our model, an entity has a single "type", but we implemented the concept of supertyping/subtyping in our domain model - We found quite a few examples where a "one-way relationship" was more than adequate and, instead of incurring the overhead of a relationship (particularly when millions of these relationships were attached to a single supernode), we used a *property* on a node containing the node id of the node it references. Sounds like a hack, but it actually has substantial performance advantages, particularly if you are frequently adding/removing relationships to/from the supernode - We overlaid our own REST API on our domain model, and wanted to come up with a simple way to resolve the URI for any given node/entity. For that, we used a pattern for which each node can have an optional "parent" node type. Example: a blog comment is always attached to a blog entry or other blog comment. A blog entry is always attached to a blog. A blog is always attached to the blogs collection, and so on. Each node has a name and/or an ID. Because those relationship "patterns" are well known, it is a trival matter to create the URI to any entity given only its node, e.g.: /Blogs/MyBlog/Entries/103/Comments/204 Of course, it works the other way as well - easy to parse and traverse. - We often found that there were data structures in our application domain for which it was OK to be "opaque" - e.g. although the structures were deep and complex, they did not require searchability or traversability (e.g. they were kind like "object blobs"), so in our metamodel, they are not stored as nodes, relationships, and properties, but rather, as a JSON blob, serialized as a string to a node property. That has worked out really well. When we do need to filter/manipulate those, we do them at the domain level Just wanted to share some more examples. Rick ________________________________________ From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Saturday, September 24, 2011 9:14 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Modelling with neo4j You raise interesting questions, most of them very much related to the work I did on Enhanced API. Let me start with the distinction between Node and Relationship, which in my opinion too is a bit artificial. I understand when creating a graph database, it is helpful to have something like vertices and edges, but indeed see those more as modalities of the elements of the graph than as clearly separated types. This was one of the reasons to unify all elements of the graph with one underlying type. At the time, I saw two option: a) make the graph bipartite, so that all relationships and properties become nodes and use relationships only as a hidden linking feature b) create shadow nodes for relationships and properties when needed and let the API handle that transparently I chose for option b for performance reasons. There are likely many applications where most of the relationships are simple, ie. link two nodes while possibly having some properties. Using a bipartite layout for such relationships adds nothing, but it takes twice as many links to traverse. The shadow node solution only treats relationships and properties as special (having relationships to them) when that is needed. Now to the typing issues. Neo4j has chosen not to add typing features to the database and I actually like that. It allows for optional type systems that can be used but are not enforced to be used. Type systems are nice beasts, especially when dealing with large and complex applications, but they impose a development overhead, mostly felt in small quick and dirty applications. This is true for programming languages, where many people prefer to use an untyped language such as Javascript, Python, Ruby and PHP over a typed language such as Java, Scala, C# or Haskell and I think it is also true for databases. I think one of the reasons NOSQL became so popular is because the type system of an RDBMS adds overhead to simple applications. An RDBMS needs a type system because the storage layout requires that. Tables have a fixed number of columns, where each column has a designated type. While this is a great feature when processing massive amounts of similar data, it can also make the application brittle. The tight coupling between type system and storage layout makes that rapid schema evolution is not easy to do. Neo4j doesn't impose a type system like an RDBMS does, because its storage layout doesn't require it. Something is either a node, a relationship or a property, but the combinations don't need to explicit modelling for the sake of storage. Because of this untyped nature of the database, it now becomes possible to add a type system that not only is optional, but can in fact be made as strong or as weak as the application demands. Unfortunately Neo4j doesn't provide all the necessary hooks for a type system, another reason why I started Enhanced API. It was not my intention with that API to provide a full fledged type system to Neo4j, but to provide the necessary hooks so a type system can be created. Of course there is some type-creep in Neo4j. Properties and relationships have names, which in almost every application are used as types. Say we have several nodes we like to use to store information about people, where each of those nodes has a property "last_name". This property name effectively is used as a type. For all nodes the property name will denote the same fact: the last name of a person. This is not necessarily required by the Neo4j database. Different nodes may use the same property name to denote different things even with different datatypes. It is possible to have nodes with property name "last_name" that for some nodes is a String while it is an Integer for other nodes. While this is possible, I venture this is not all that common. The same property name will likely be used to denote the same fact and have the same datatype across the graph and therefore in most common cases be used like a type. The same applies to relationships, where the name will in general be used to denote the same type of relationship. It is unlikely an application with use the "FRIEND" relationship to sometimes denote a friendship between two people while at other times use that relationship name to denote the address of a building. This is as far as typing goes in Neo4j, but it is there and means we have to incorporate it into the API somehow. This is the reason why I decided to add subtyping of relationship-types and property-types in the API, a feature that may be of interest to the model you describe in your email. Joe is a janitor at the school. Here we see three elements: "Joe", "is janitor at", and "the school", which can indeed be modeled with two nodes and a relationship. There is however a more general statement here of the form: person works with organization. Suppose we want to store the fact:" Jane is principal of the school. Again we can model this with two nodes and a relationship. The standard API offers no features to ask the question "who works at the school?". Subtyping of relationship-types helps for these cases. We can create a relationship-type "WORKS_FOR" and state it has the subtypes "JANITOR_AT" and ""PRINCIPAL_OF". Now we can ask for all nodes that have a "WORKS_FOR" relationship and not only those relationships that are directly stored as "WORKS_FOR" are being returned, but also those relationships that are stored as "JANITOR_AT" and as "PRINCIPAL_OF". Now to your original question: how to store general information about janitors. Enhanced API reifies RelationshipType as a Node. With that a RelationshipType is no longer just a name, but becomes something that can take additional information (both in the form of properties and in the form of relationships). It is up to the application programmer how to use this relationshiptype-node, the API only gives you a node, and with that the hooks to use and abuse it to your heart's content. Finally, I see two of the layers you describe. For some applications it is necessary to have something akin to a type system, and Enhanced API offers you the hooks to create one. I don't immediately see what the purpose is of the interface layer. Could you expand on that feature a bit more, so I may help you figure out how to tackle that issue. Niels > Date: Fri, 23 Sep 2011 22:52:14 -0700 > From: lold...@gmail.com > To: user@lists.neo4j.org > Subject: [Neo4j] Modelling with neo4j > > I'm trying to figure out how to model the world most flexibly (okay, so I'm > sticking to modelling organisations for now, but still). My main problem > seems to occur when I want to allow the model to naturally expand in > complexity. Say we have the following relationship: > > Joe is a janitor at the school. > > This can easily be modelled with two entities and a relationship. Now say I > have some common properties for janitors. I would have to make a link from > the janitor-relation to some node denoting the type 'janitor' which could > then hold information on these common things. Unfortunately, relationships > doesn't support that. > > Long story short: the problem is that sometimes I want my things to act as > things, sometimes as types, sometimes as interfaces, and I cannot know in > advance which of these modalities I'm going to need. > > Therefore, I'm considering going with this model: > > Imagine a graph in three layers. The lower layer represents things, the > middle layer represents types and the upper layer represents interfaces. > Initially i populate only the lowest layer, but as need arise I go back and > promote various things to also be types or interfaces. These then crop up in > the second and third layer of the graph, respectively. When this happens, a > vertical relationship is added between the element in the lower layer and > its new type/interface in three higher layers. > > Now the question is: how to model this scheme in neo4j? A number of > challenges pops up: > > * Neo4j relationships cannot be n-ary, so every relationship must be > modelled with a hyperrelationship, thus allowing future relations to the > second and third layers. > > * In a modalities-are-a-changing-paradigm it doesn't really make sense to > distinguish between relations and entities; at different points in time, one > element may have to act in the roles of both. Neo4j however makes a > fundamental destinction between the two things. I could choose too model all > relationships as nodes, but will that not make graph traversals messy? > > * Neo4j doesn't come with a type strong destinction between such three > layers of modalityy > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3363823.html > Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user