> Can someone please explain the mmap issue. > mmap is default for all storage files for 64bit machines. > according to this case https://issues.apache.org/jira/browse/CASSANDRA-1214 > it might not be a good thing. > Is it right to say that you should use mmap only if your MAX expected data > is smaller then the MIN free RAM that could be in your system?
Not really. That is, the intent of mmap is to let the OS dynamically choose what gets swapped in and out. The practical problem is that the OS will often tend to swap too much. I got the impression jbellis wasn't convinced, but my anecdotal experience is that this is a much larger problem for mmap():ed data than for regular buffer cached data - presumably, or so my assumption has been, because in the cache of the buffer cache the kernel has direct knowledge that it's cache only while with mmap() it's directly competing with regular application memory (I haven't actually checked the source; I suppose I should). One thing you can do is decrease swappiness (assuming Linux; check out /proc/sys/vm/swappiness) and see if it helps. But in general, you don't have, to my knowledge, good direct control over swapping policies. As noted in the thread, the best bet would probably be to make the JVM use mlock()/mlockall() to guarantee that the JVM doesn't swap anything out, and then let the OS do it's thing with any remaining data. That said, certainly if the total amount of data is less than the minimum free after JVM heap, you're very much less likely to see swapping. But it's not the intent that you should only use mmap() under such circumstances. Also, personally I'm interested in hearing what kind of performance impacts people have *actually* seen with standard I/O; especially if cassandra is configured to configure a significant amount of data in RAM itself. I'm a bit skeptical about claims of extreme performance differences, in spite of syscalls being expensive. -- / Peter Schuller