Hi all. We are using the Cassandra 1.2 StorageServiceMBean class (using JMX Bulk Loader) to load the DB image into the Cassandra cluster. After the DB image loading, we issued the bulk retrieval to get the data back using the Hector API’s multigetSliceQuery. Let’s call the method file-based bulk loading.
Alternatively, we used the Hector API to do the online batch-insertion (Mutator.addInsertion). Let’s call this second method online-bulk-insertion. After this online-bulk-insertion, we issued the bulk retrieval to get the data back using the Hector API’s multigetSliceQuery. We find that the retrieval performance is about 10 time better, after the online-bulk-insertion, compared to the file-based bulk loading. Certainly, the explanation is that after the online-bulk-insertion, the inserted data is in the memtable, and that certainly speed up the subsequent bulk retrieval. My questions are: - is that is there a way that we can warm-up the cache, after the file-based bulk loading, so that we can allow the data to be cached first in the memory, and then afterwards, when we issue the bulk retrieval, the performance can be closer to what is provided by the online-bulk-insertion. - Will sstableloader provide in cassandra’s bin directory perform differently, compared to JMX Bulk Loader? - Do I need to wait for some time after the JmxBulkLoader loading or sstableLoader’s loading, before I can issue the bulk retrieval call, as the Cassandra cluster is doing some house keeping, such as Index building, for the newly bulk loaded data? I did try bin/nodetool refresh, after the file-based bulk loading, and I did not see the effect Thanks in advance. Elias.