Re: [Neo4j] question

Craig Taverner Wed, 30 Mar 2011 04:54:05 -0700

>
> Ok, in fact it shouldn't be a performance downgrade even with large
> blobs, right? It just depends on whether the queried part refers to an
> id or similar and that node then is simply connected to the blob. I.e.
> I extract the blob only when I am sure it is the wanted. Is that what
> you meant?
>


If the blob is a property of a node, it is not loaded when you access that
node, but when you access its properties. I do not know enough about the
implications on performance with large blobs, only that it has been
mentioned many times before that for really large blogs, rather store them
somewhere else (eg. filesystem) and reference them from the graph (eg. path
to file, url, etc.). But I still believe that blogs are not big enough to
really be a concern, but perhaps someone more knowledgable can correct me
here?

Probably because of the database type the "old" way of having a look
> at the data is not possible any longer. But then which is the right
> way? Having a console and let Gremlin shine?
>

Filtering the neoclipse view with relationship types and directions helps.
Limiting the number of nodes returns helps a lot. I use 100 max. But use
neoclipse as a visualization tool mostly for visualizing the structure, not
for analytics.

Ok, I change my question. What do you do when you have two big types
> of data, one that does perfectly fit in the graph concept, and one the
> really doesn't have anything to do with it? I guess you put everything
> into the neo4j db and then query one with the graph traverser and the
> other one with the lucene indexer?
> My questions might seem a bit dummy, I apologize for that, I am trying
> to understand why and how I should make use of a graph database.
>

When I'm deciding between using a graph or using lucene, the size of the
data is not really a factor, but its 'graphiness' :-) For example, if I have
a property of very high diversity, like peoples names, then lucene is a
natural choice. If you have a property with structure, like categories or
tags, or inheritance, or other relationship concepts, then the graph is
best. There are cases in the middle, for example I generally model numerical
properties in the graph, but I think most others would use lucene. I use the
graph because it naturally leads to statistics data. For example, if we use
the time property, and collect all events in the same second and connect
them to the same 1s time node, we now know the number of events in that
second from the structure of the graph. Connect each 1s node to a 1min node,
and we know how many seconds in that minute contained data, etc. Obviously
this is a very simple special case, and usually I keep more statistical
metadata in the graph tree than mere counters, but the result is that your
index now contains lots of statistics you can query without even touching
the original data nodes (ie. very fast statistics queries).
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] question

Reply via email to