Re: Performance problem with large wide row inserts using CQL

DuyHai Doan Thu, 20 Feb 2014 14:30:01 -0800

Rüdiger

"SortedMap<byte[], SortedMap<byte[], Pair<Long, byte[]>>"


 When using a RandomPartitioner or Murmur3Partitioner, the outer map is a
simple Map, not SortedMap.

 The only case you have a SortedMap for row key is when using
OrderPreservingPartitioner, which is clearly not advised for most cases
because of hot spots in the cluster.



On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn <rkla...@gmail.com> wrote:

> Hi Sylvain,
>
> I applied the patch to the cassandra-2.0 branch (this required some manual
> work since I could not figure out which commit it was supposed to apply
> for, and it did not apply to the head of cassandra-2.0).
>
> The benchmark now runs in pretty much identical time to the thrift based
> benchmark. ~30s for 1000 inserts of 10000 key/value pairs each. Great work!
>
>
> I still have some questions regarding the mapping. Please bear with me if
> these are stupid questions. I am quite new to Cassandra.
>
> The basic cassandra data model for a keyspace is something like this,
> right?
>
> SortedMap<byte[], SortedMap<byte[], Pair<Long, byte[]>>
>                  ^ row key. determines which server(s) the rest is stored
> on
>                                              ^ column key
>                                                                ^ timestamp
> (latest one wins)
>                                                                         ^
> value (can be size 0)
>
> So if I have a table like the one in my benchmark (using blobs)
>
> CREATE TABLE IF NOT EXISTS test.wide (
>   time blob,
>   name blob,
>   value blob,
>   PRIMARY KEY (time,name))
>   WITH COMPACT STORAGE
>
> From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems
> that
>
> - time maps to the row key and name maps to the column key without any
> overhead
> - value directly maps to value in the model above without any prefix
>
> is that correct, or is there some overhead involved in CQL over the raw
> model as described above? If so, where exactly?
>
> kind regards and many thanks for your help,
>
> Rüdiger
>
>
> On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne <sylv...@datastax.com>wrote:
>
>>
>>
>>
>> On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn <rkla...@gmail.com>wrote:
>>
>>>
>>> I have cloned the cassandra repo, applied the patch, and built it. But
>>> when I want to run the bechmark I get an exception. See below. I tried with
>>> a non-managed dependency to
>>> cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I
>>> compiled from source because I read that that might help. But that did not
>>> make a difference.
>>>
>>> So currently I don't know how to give the patch a try. Any ideas?
>>>
>>> cheers,
>>>
>>> Rüdiger
>>>
>>> Exception in thread "main" java.lang.IllegalArgumentException:
>>> replicate_on_write is not a column defined in this metadata
>>>     at
>>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>>>     at
>>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>>>     at com.datastax.driver.core.Row.getBool(Row.java:117)
>>>     at
>>> com.datastax.driver.core.TableMetadata$Options.<init>(TableMetadata.java:474)
>>>     at
>>> com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107)
>>>     at
>>> com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128)
>>>     at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89)
>>>     at
>>> com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259)
>>>     at
>>> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214)
>>>     at
>>> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161)
>>>     at
>>> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
>>>     at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890)
>>>     at
>>> com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910)
>>>     at
>>> com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806)
>>>     at com.datastax.driver.core.Cluster.connect(Cluster.java:158)
>>>     at
>>> cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31)
>>>     at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>>>     at
>>> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>>>     at scala.App$$anonfun$main$1.apply(App.scala:71)
>>>     at scala.App$$anonfun$main$1.apply(App.scala:71)
>>>     at scala.collection.immutable.List.foreach(List.scala:318)
>>>     at
>>> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>>>     at scala.App$class.main(App.scala:71)
>>>     at
>>> cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5)
>>>     at
>>> cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala)
>>>
>>
>> I believe you've tried the cassandra trunk branch? trunk is basically the
>> future Cassandra 2.1 and the driver is currently unhappy because the
>> replicate_on_write option has been removed in that version. I'm supposed to
>> have fixed that on the driver 2.0 branch like 2 days ago so maybe you're
>> also using a slightly old version of the driver sources in there? Or maybe
>> I've screwed up my fix, I'll double check. But anyway, it would be overall
>> simpler to test with the cassandra-2.0 branch of Cassandra, with which you
>> shouldn't run into that.
>>
>> --
>> Sylvain
>>
>
>

Re: Performance problem with large wide row inserts using CQL

Reply via email to