Thanks for your answer. 2011/9/2 Sylvain Lebresne <sylv...@datastax.com>: > On Fri, Sep 2, 2011 at 10:29 AM, Benoit Perroud <ben...@noisette.ch> wrote: >> Hi All, >> >> I started using SSTableSimpleUnsortedWriter to load data, and my data >> has a few rows but a lot of column name in each rows. >> >> I call SSTableSimpleUnsortedWriter.newRow every 10'000 columns inserted. >> >> But the time taken to insert columns is increasing as the column >> family is increasing. The problem appears because everytime we call >> newRow, all the columns of the previous CF is added to the new CF. > > If I understand correctly, each row has way more that 10 000 columns, but > you call newRow every 10 000 columns, right ?
Yes. I call newRow every 10 000 columns to be sure to flush as soon as possible. > Note that you have the possibility to decrease the frequency of the calls to > newRow. > > But anyway, I agree that the code shouldn't suck like that. > >> Attached is a small patch that check which is the smallest CF, and add >> the smallest CF to the biggest one. >> >> Should I open I bug for that ? > > Please do. I'm actually thinking of a slightly different fix: we should not > have > to add all the previous columns to the new column family, we should just > directly reuse the previous column family when adding the new column. > But the JIRA ticket will be a better place to discuss this. Opened : https://issues.apache.org/jira/browse/CASSANDRA-3122 Let's discuss there. Thanks ! Benoit. > -- > Sylvain >