[ 
https://issues.apache.org/jira/browse/CASSANDRA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-9323.
---------------------------------------
       Resolution: Won't Fix
    Fix Version/s:     (was: 2.1.x)

The preferred way to bulk load is now COPY; see CASSANDRA-11053 and linked 
tickets.

> Bulk loading is slow
> --------------------
>
>                 Key: CASSANDRA-9323
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9323
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Pierre N.
>         Attachments: App.java
>
>
> When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I 
> tested on a fresh cassandra node (nothing in keyspace, nor tables) with good 
> hardware (8x2.8ghz, 32G ram), but with classic hard disk (performance won't 
> be improved with SSD in this case I think). 
> When I upload from a different server an sstable using sstableloader I get an 
> average of 3 MB/sec, in the attached example I managed to get 5 MB/sec, which 
> is still slow.
> During the streaming process  I noticed that one core of the server is full 
> CPU, so I think the operation is CPU bound server side. I quickly attached a 
> sample profiler to the cassandra instance and got the following output : 
> https://i.imgur.com/IfLc2Ip.png
> So, I think, but I may be wrong because it's inaccurate sampling, during 
> streaming the table is unserialized and reserialized to another sstable, and 
> that's this unserialize/serialize process which is taking a big amount of 
> CPU, slowing down the insert speed.
> Can someone confirm the bulk load is slow ? I tested also on my computer and 
> barely reach 1MB/sec 
> I don't understand the point of totally unserializing the table I just did 
> build using the CQLSStableWriter (because it's already a long process to 
> build and sort the table), couldn't it just copy the table from offset X to 
> offset Y (using index information by example) without 
> unserializing/reserializing it ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to