Re: [Neo] NEO Performance

Johan Svensson Fri, 12 Dec 2008 07:09:42 -0800

Hi Jürgen,

I will try to answer some of your questions.

On Thu, Dec 11, 2008 at 10:17 PM, Jürgen Umbrich
<juergen.umbr...@deri.org> wrote:
> first off all, I am very fascinated about neo and the neo4j library.
> I work as part of my PhD in the domain of web crawling and thus, I
> thought about using neo4j to store the crawl traverse- and link-graph.
> Assuming that we can fetch max ~150 docs/sec and extracting for each
> document in average 30 links (very conservative assumption) neo should
> be able to handle 4500 inserts.sec (avg 4,5 inserts/ms)!
>

First let me just point out that Neo is transactional meaning that the
system can crash at any point in time and the system will be brought
back to the correct consistent state (snapshot from last commit before
crash). That will slow things down a bit.

> I tried to get some benchmark values from the neo4j homepages but I was
> not successful.

You are right about no public benchmarks. We should work on that after
we've released 1.0 final. Please let us know (all of you) if you have
any specific benchmark requests.

> Currently I ll run some benchmark tests with neo4j. (storing a rdf graph
> with SingleValueIndex and node- and relationship objects containing one
> property value the URI,BNode or Literal value).
> So far,I figured out that it takes around 1,5 ms for an insert of a
> link. (using transaction batches of size 1000 and 10000).
>

Single value index is currently pretty slow. If you just create 1
node, 1 relationship, 1 property for each "link" and no index you
should get about 10-30 inserts/ms depending on hardware (would
translate to about 300-900 documents/s in your case). If you add
synchronous indexing to that it will drop by a factor of 10.

> Can anybody provide me with some benchmarks or a general comment/design
> ideas that it is possible to handle these amount of inserts/sec!?
>

To handle this amount (150 documents/s) during a longer time will be
hard because of indexing. If you however can live with asynchronous
indexing (keep latest index in memory and write in background
thread/transaction to disk) it is possible to handle shorter times of
that load.

> Also, it seems like neo is not checking if a relationship with a for a
> certain relationshiptype between two nodes exists already, so I need to
> check if I insert duplicate edges. Or did I missed something!?
>

Yes you are right and this is because it is valid in the model. As an
example lets say we are to model something like twitter. We have users
(nodes) then users can follow each other (relationship of type
FOLLOWS). It would then be possible to have 3 combinations:

User A--FOLLOWS->User B

User B--FOLLOWS->User A

User A->FOLLOWS->User B--FOLLOWS->User A

Regards,
Johan
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] NEO Performance

Reply via email to