On Tue, Feb 15, 2011 at 9:14 PM, Mattias Persson
<matt...@neotechnology.com> wrote:

> 100 million sounds strange :) but to have a hand full of key/value pairs
> pointing to the same entity is rather normal. Could you elaborate more on
> that use case to let us know why you apparently have super many of those to
> the same entity?

I need to elaborate a series of data coming in in a form of a log row
inside a file. The data consist of action taken from users.

The problem is very similar to having to parse squid log files which
contains authenticated users. Each log file could arrive from more
then one location (squid peering simile) and I've to be sure to not
elaborate the data more then one time.

So I though I could calculate an hash of each row and index it while
storing the row's Node in the db. After a couple of tests it has been
revealed that storing each row in the db is not feasible since it
would occupy more then 700G of disk space since each node would
consist of at least 800 bytes (node + properties) and I've 222 million
nodes growing.
Since what I really want to know is the existence of an hash within
the index, which means I've already elaborated the unit of work (row),
I though I could simply index each hash to the same node.

I know it sounds strange cause it's the same to me but I cannot afford
to store that amount of data, even more cause it doesn't contain the
elaborated data yet.

Any hints is really appreciated.
-- 
Massimo
http://meridio.blogspot.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to