> > Ok, in fact it shouldn't be a performance downgrade even with large > blobs, right? It just depends on whether the queried part refers to an > id or similar and that node then is simply connected to the blob. I.e. > I extract the blob only when I am sure it is the wanted. Is that what > you meant? >
If the blob is a property of a node, it is not loaded when you access that node, but when you access its properties. I do not know enough about the implications on performance with large blobs, only that it has been mentioned many times before that for really large blogs, rather store them somewhere else (eg. filesystem) and reference them from the graph (eg. path to file, url, etc.). But I still believe that blogs are not big enough to really be a concern, but perhaps someone more knowledgable can correct me here? Probably because of the database type the "old" way of having a look > at the data is not possible any longer. But then which is the right > way? Having a console and let Gremlin shine? > Filtering the neoclipse view with relationship types and directions helps. Limiting the number of nodes returns helps a lot. I use 100 max. But use neoclipse as a visualization tool mostly for visualizing the structure, not for analytics. Ok, I change my question. What do you do when you have two big types > of data, one that does perfectly fit in the graph concept, and one the > really doesn't have anything to do with it? I guess you put everything > into the neo4j db and then query one with the graph traverser and the > other one with the lucene indexer? > My questions might seem a bit dummy, I apologize for that, I am trying > to understand why and how I should make use of a graph database. > When I'm deciding between using a graph or using lucene, the size of the data is not really a factor, but its 'graphiness' :-) For example, if I have a property of very high diversity, like peoples names, then lucene is a natural choice. If you have a property with structure, like categories or tags, or inheritance, or other relationship concepts, then the graph is best. There are cases in the middle, for example I generally model numerical properties in the graph, but I think most others would use lucene. I use the graph because it naturally leads to statistics data. For example, if we use the time property, and collect all events in the same second and connect them to the same 1s time node, we now know the number of events in that second from the structure of the graph. Connect each 1s node to a 1min node, and we know how many seconds in that minute contained data, etc. Obviously this is a very simple special case, and usually I keep more statistical metadata in the graph tree than mere counters, but the result is that your index now contains lots of statistics you can query without even touching the original data nodes (ie. very fast statistics queries). _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user