Great thoughts guys! I think it would be interesting to break out the Enhanced API" from graph-collections, rename it into something better (we can think of a name together) and provide a more fully fledged example that we can document and evolve.
WDYT? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta <rick.bullo...@thingworx.com> wrote: > That's a great summary, Niels. Very similar to how we've applied Neo4J here > at ThingWorx, though we've done most of the type system work (nodes and > relationships are all typed/subtyped) in our application domain layer. A few > other items that we leveraged in our implementation that you may wish to > consider: > > - A common pattern we encountered was a "collection" of typed entities (e.g. > a typed collection), and we implemented a specific model using supernodes for > this. This also allowed us to rapidly and easily iterate/search collections > and also to organize nodes in a "human comprehensible way" that can be > readily viewed with something like Neoclipse for troubleshooting purposes. > Also, if the type was "truck", we stamped the node with the type "truck" as a > property (using enumerations with a custom int member) and used that same > enum as the relationship type between the node and the collection node. In > our model, an entity has a single "type", but we implemented the concept of > supertyping/subtyping in our domain model > > - We found quite a few examples where a "one-way relationship" was more than > adequate and, instead of incurring the overhead of a relationship > (particularly when millions of these relationships were attached to a single > supernode), we used a *property* on a node containing the node id of the node > it references. Sounds like a hack, but it actually has substantial > performance advantages, particularly if you are frequently adding/removing > relationships to/from the supernode > > - We overlaid our own REST API on our domain model, and wanted to come up > with a simple way to resolve the URI for any given node/entity. For that, we > used a pattern for which each node can have an optional "parent" node type. > Example: a blog comment is always attached to a blog entry or other blog > comment. A blog entry is always attached to a blog. A blog is always > attached to the blogs collection, and so on. Each node has a name and/or an > ID. Because those relationship "patterns" are well known, it is a trival > matter to create the URI to any entity given only its node, e.g.: > > /Blogs/MyBlog/Entries/103/Comments/204 > > Of course, it works the other way as well - easy to parse and traverse. > > - We often found that there were data structures in our application domain > for which it was OK to be "opaque" - e.g. although the structures were deep > and complex, they did not require searchability or traversability (e.g. they > were kind like "object blobs"), so in our metamodel, they are not stored as > nodes, relationships, and properties, but rather, as a JSON blob, serialized > as a string to a node property. That has worked out really well. When we do > need to filter/manipulate those, we do them at the domain level > > Just wanted to share some more examples. > > Rick > > ________________________________________ > From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf > Of Niels Hoogeveen [pd_aficion...@hotmail.com] > Sent: Saturday, September 24, 2011 9:14 AM > To: user@lists.neo4j.org > Subject: Re: [Neo4j] Modelling with neo4j > > You raise interesting questions, most of them very much related to the work I > did on Enhanced API. > > Let me start with the distinction between Node and Relationship, which in my > opinion too is a bit artificial. I understand when creating a graph database, > it is helpful to have something like vertices and edges, but indeed see those > more as modalities of the elements of the graph than as clearly separated > types. This was one of the reasons to unify all elements of the graph with > one underlying type. > > At the time, I saw two option: > > a) make the graph bipartite, so that all relationships and properties become > nodes and use relationships only as a hidden linking feature > b) create shadow nodes for relationships and properties when needed and let > the API handle that transparently > > I chose for option b for performance reasons. There are likely many > applications where most of the relationships are simple, ie. link two nodes > while possibly having some properties. Using a bipartite layout for such > relationships adds nothing, but it takes twice as many links to traverse. > > The shadow node solution only treats relationships and properties as special > (having relationships to them) when that is needed. > > Now to the typing issues. Neo4j has chosen not to add typing features to the > database and I actually like that. It allows for optional type systems that > can be used but are not enforced to be used. > > Type systems are nice beasts, especially when dealing with large and complex > applications, but they impose a development overhead, mostly felt in small > quick and dirty applications. This is true for programming languages, where > many people prefer to use an untyped language such as Javascript, Python, > Ruby and PHP over a typed language such as Java, Scala, C# or Haskell and I > think it is also true for databases. I think one of the reasons NOSQL became > so popular is because the type system of an RDBMS adds overhead to simple > applications. > > An RDBMS needs a type system because the storage layout requires that. Tables > have a fixed number of columns, where each column has a designated type. > While this is a great feature when processing massive amounts of similar > data, it can also make the application brittle. The tight coupling between > type system and storage layout makes that rapid schema evolution is not easy > to do. > > Neo4j doesn't impose a type system like an RDBMS does, because its storage > layout doesn't require it. Something is either a node, a relationship or a > property, but the combinations don't need to explicit modelling for the sake > of storage. > > Because of this untyped nature of the database, it now becomes possible to > add a type system that not only is optional, but can in fact be made as > strong or as weak as the application demands. > > Unfortunately Neo4j doesn't provide all the necessary hooks for a type > system, another reason why I started Enhanced API. It was not my intention > with that API to provide a full fledged type system to Neo4j, but to provide > the necessary hooks so a type system can be created. > > Of course there is some type-creep in Neo4j. Properties and relationships > have names, which in almost every application are used as types. Say we have > several nodes we like to use to store information about people, where each of > those nodes has a property "last_name". This property name effectively is > used as a type. For all nodes the property name will denote the same fact: > the last name of a person. > > This is not necessarily required by the Neo4j database. Different nodes may > use the same property name to denote different things even with different > datatypes. It is possible to have nodes with property name "last_name" that > for some nodes is a String while it is an Integer for other nodes. While this > is possible, I venture this is not all that common. The same property name > will likely be used to denote the same fact and have the same datatype across > the graph and therefore in most common cases be used like a type. > > The same applies to relationships, where the name will in general be used to > denote the same type of relationship. It is unlikely an application with use > the "FRIEND" relationship to sometimes denote a friendship between two people > while at other times use that relationship name to denote the address of a > building. > > This is as far as typing goes in Neo4j, but it is there and means we have to > incorporate it into the API somehow. > > This is the reason why I decided to add subtyping of relationship-types and > property-types in the API, a feature that may be of interest to the model you > describe in your email. > > Joe is a janitor at the school. > > Here we see three elements: "Joe", "is janitor at", and "the school", which > can indeed be modeled with two nodes and a relationship. > > There is however a more general statement here of the form: person works with > organization. Suppose we want to store the fact:" > > Jane is principal of the school. Again we can model this with two nodes and a > relationship. > > The standard API offers no features to ask the question "who works at the > school?". > > Subtyping of relationship-types helps for these cases. We can create a > relationship-type "WORKS_FOR" and state it has the subtypes "JANITOR_AT" and > ""PRINCIPAL_OF". > > Now we can ask for all nodes that have a "WORKS_FOR" relationship and not > only those relationships that are directly stored as "WORKS_FOR" are being > returned, but also those relationships that are stored as "JANITOR_AT" and as > "PRINCIPAL_OF". > > Now to your original question: how to store general information about > janitors. Enhanced API reifies RelationshipType as a Node. With that a > RelationshipType is no longer just a name, but becomes something that can > take additional information (both in the form of properties and in the form > of relationships). > > It is up to the application programmer how to use this relationshiptype-node, > the API only gives you a node, and with that the hooks to use and abuse it to > your heart's content. > > Finally, I see two of the layers you describe. For some applications it is > necessary to have something akin to a type system, and Enhanced API offers > you the hooks to create one. I don't immediately see what the purpose is of > the interface layer. Could you expand on that feature a bit more, so I may > help you figure out how to tackle that issue. > > Niels > > > >> Date: Fri, 23 Sep 2011 22:52:14 -0700 >> From: lold...@gmail.com >> To: user@lists.neo4j.org >> Subject: [Neo4j] Modelling with neo4j >> >> I'm trying to figure out how to model the world most flexibly (okay, so I'm >> sticking to modelling organisations for now, but still). My main problem >> seems to occur when I want to allow the model to naturally expand in >> complexity. Say we have the following relationship: >> >> Joe is a janitor at the school. >> >> This can easily be modelled with two entities and a relationship. Now say I >> have some common properties for janitors. I would have to make a link from >> the janitor-relation to some node denoting the type 'janitor' which could >> then hold information on these common things. Unfortunately, relationships >> doesn't support that. >> >> Long story short: the problem is that sometimes I want my things to act as >> things, sometimes as types, sometimes as interfaces, and I cannot know in >> advance which of these modalities I'm going to need. >> >> Therefore, I'm considering going with this model: >> >> Imagine a graph in three layers. The lower layer represents things, the >> middle layer represents types and the upper layer represents interfaces. >> Initially i populate only the lowest layer, but as need arise I go back and >> promote various things to also be types or interfaces. These then crop up in >> the second and third layer of the graph, respectively. When this happens, a >> vertical relationship is added between the element in the lower layer and >> its new type/interface in three higher layers. >> >> Now the question is: how to model this scheme in neo4j? A number of >> challenges pops up: >> >> * Neo4j relationships cannot be n-ary, so every relationship must be >> modelled with a hyperrelationship, thus allowing future relations to the >> second and third layers. >> >> * In a modalities-are-a-changing-paradigm it doesn't really make sense to >> distinguish between relations and entities; at different points in time, one >> element may have to act in the roles of both. Neo4j however makes a >> fundamental destinction between the two things. I could choose too model all >> relationships as nodes, but will that not make graph traversals messy? >> >> * Neo4j doesn't come with a type strong destinction between such three >> layers of modalityy >> >> -- >> View this message in context: >> http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3363823.html >> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user