Re: [Neo4j] Multiple Server Instances: delayed performance hit?

Michael Hunger Mon, 08 Aug 2011 06:29:50 -0700

Igor,

the configuration of the memory-mapped settings depends on your domain. You 
said you have about 4M nodes (up to 40M) what is the number of relationships, 
properties and strings/arrays.


Normally you'd partition your 7G in half using 2G for heap and 1.5 G for memory 
mapped settings (distributed according to your domain model).

As you mention continous results you don't run into the issues that other 
people have, which encounter the issues when the many relationships are first 
loaded, but in your case the nodes and rels should be in the cache.

You can look into data/graph.db/messages.log for the current heap and 
memory-mapped settings which are printed at startup.

Regarding your traverser - difficult to help without the source and 
understanding of your domain :)

What is the disk setup for the machine?

Cheers

Michael

Am 08.08.2011 um 15:19 schrieb Igor Dovgiy:

> No, I didn't change the heap sizes; tried to tinker a bit with neo4j memory
> settings, but the results - and the behavior - were still the same, so I
> stopped.
> 
> Could you please give an example of how to configure memory settings for two
> Neo4J instances, if these instances are doing the same work?
> 
> I really don't know whether speed is low or not: we've started with 20-30
> nodes, but then gained speed by optimizing the traverser. Our roadblocks
> are, I suppose, the same as everybody else's - nodes with too much
> relationships (50-100K in our case). Our traverser is, well, breakable (we
> are jumping out of iteration when we've got enough results, or if travelled
> path count is already too big), so I don't see much space for improvement
> there...
> 
> As for environment:
> 2.6.31-302-ec2 Ubuntu SMP x86_64 GNU/Linux
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> 
> $ cat /sys/block/ /queue/scheduler
> noop anticipatory [deadline] cfq
> 
> Have to check for io-waits; I noticed that 'rescheduling interrupts' count
> was high, though.
> 
> Oh, and thank you for your swift answer! :)
> 
> -- iD
> 
> 
> 
> On 8 August 2011 15:51, Michael Hunger 
> <michael.hun...@neotechnology.com>wrote:
> 
>> Did you configure the heap sizes for both neo4j instances and
>> also the memory-mapped settings?
>> 
>> Otherwise a neo4j-instance will assume it is alone on a physical machine
>> and tries to use the available physical
>> RAM for itself. So if you run more than one instance on a machine you MUST
>> configure the memory mapped setting manually (
>> http://wiki.neo4j.org/content/Configuration_Settings).
>> 
>> But 100 - 120 nodes per minute processed is much to slow.
>> 
>> What OS are you running on ? (if linux, what scheduler is used -> try to
>> use deadline or as).
>> 
>> What does top, iostat or vmstat say about io-waits ?
>> What does jconsole or the GC logs say about memory usage and full-gcs ?
>> 
>> From your email I read that you have a system with 2 cores and 7 gigs of
>> RAM?
>> What JVM are you using?
>> 
>> Thanks
>> 
>> Michael
>> 
>> Am 08.08.2011 um 14:37 schrieb Igor Dovgiy:
>> 
>>> Hi all,
>>> 
>>> I wonder is that a common case, or rather a very specific one; anyway,
>> would
>>> deeply appreciate any help. :)
>>> 
>>> For a long time we've been thinking about how to speed up our operations.
>>> Those are traversals: not very deep (level of depth is 2 at most), but
>>> rather complex ones (with custom-made evaluator and selector policy
>>> classes), and our DB for now has more than 4M nodes (but should have at
>>> least 40M when deployed).
>>> 
>>> In this quest I've attempted to run multiple server instances at once -
>> and,
>>> well, was greatly surprised by outcome of this decision. :)
>>> 
>>> In single-instance mode I got about 100-120 nodes per minute processed,
>> no
>>> matter how many client processes were thrown at DB (results are fetched
>> via
>>> REST API). Results were more-o-less stable, though.
>>> 
>>> But in double-instance mode speed bumped up to 200-250 nodes ppm right
>> after
>>> the start, held there for 10 minutes - and then crashed down to meager
>> 50-60
>>> nodes. Then again, after spending a considerably long time at this level,
>>> for twenty minutes it went to 200-250 nodes ppm again!
>>> 
>>> Watching system running with htop, I saw a clear pattern: when both cpu
>>> cores were loaded at 90%, processing speed was great, but, sadly, most of
>>> the time cpu was quite undertasked. )
>>> I wonder what might cause such behavior, and is there a way for me to
>>> improve the performance, perhaps with additional settings for JVM?
>>> 
>>> I've already updated Neo4j for 1.4.1 GA, but the problem still there.
>> We've
>>> got about 7 GB RAM free, two Neo4j instances eat up about 4 GB total.
>>> 
>>> P.S. Tried to set java processes cpu core affinity (with taskset), got
>> zero
>>> effect.
>>> 
>>> --
>>> -- iD
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>> 
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Multiple Server Instances: delayed performance hit?

Reply via email to