Vincent, I think this is quite normal for your setup. Since you have not much memory, and a lot of data, you are hitting cold nodes pretty often, especially when the caches are full after a while. Then, it is a matter of how fast you can access the disk. How big are your store files?
To improve IO, you either can increase RAM, or get faster disk IO by e.g. using SSD or even trying out setting up a RAM disk to verify this theory. Could you try that and report back? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Ă–resund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, May 18, 2011 at 12:25 PM, Vincent Boulaye <vboul...@gmail.com>wrote: > Hi, > > I am currently evaluating neo4j for a new project, and I am getting a lot > of > IO reads (and waits) during my tests. I'm wondering whether this is > expected > or not... > > Here are some more details about the test: > The target application should handle around 5 billions nodes and as many > relations. > Half of the nodes will hold 5 small String/date properties the other half > must hold 10 properties on average. (only 2 or 3 are indexed) > Nowadays the nodes are distributed as small trees of around 100 nodes, but > they will become more and more interconnected in the future. > > I did a first batch load with 100 million nodes, 100 million relations, 400 > million properties. > This took less than 2 hours on my test machine (core i7, 4GB ram), and I > was > quite happy at this point! > > Then I tried to test concurrent access to the data. > I built a small web service that query a main node by an indexed property, > then browse the tree of connected nodes (~100 nodes) and return them > (reading ~400 properties to build the response). > > I use the embedded graph db, and I didn't give a lot of memory to neo4j for > the memory mapping (~1GO) because in the target application, I won't be > able to put everything in memory anyway. > The heap is configured at 1MB. The server is an ubuntu 10.10, using the sun > jdk 1.6.21. > > When the test start, I suppose that all queries hit the memory because all > queries answer in 50ms 90%percentile, which is perfect!. > But then after a while when I reach around 30 concurrent clients (each > making 1 query/s), the response time becomes erratic, half of the time the > response still comes back in less than 100ms, but the other half is closer > to 1s. > In visualvm I see that several threads are often blocked (apparently in > PersistenceWindowPool and usually for ~1s). > And in vmstat I see a lot of disk reads (~3MB/s) and IO waits >10%. > > During my tests I played with the amount of mapped memory, I tried the > different types of caches and the readonly graphdatabase, but the problem > always appear after a while. > I also tried neo4j 1.3 and 1.4M02 with the same results. > > Sometimes the application seems almost stable with reads ~1OOkB/s and IO > wait <2% but this never last very long. > > I am wondering if this behaviour is normal and whether I am just supposed > to > put more RAM in the server or if there is something wrong in my setup ? > > > Thanks & Regards, > Vincent > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user