Yes I was referring referring to multithreaded_compaction, but just because
we didn’t get bitten by this setting just doesn’t mean it’s right, and the
jira is a clear indication of that ;)

@Anishek that reminds me of these settings to look at as well:

   - concurrent_write and concurrent_read both need to be adapted to your
   actual hardware though.

 Cassandra is, more often than not, disk constrained though this can change
for some workloads with SSD’s.

Yes that is typically the case, SSDs are more and more commons but so are
multi-core CPUs and the trend to multiple cores is not going to stop ; just
look at the next Intel *flagship* : Knights Landing
<http://www.anandtech.com/show/8217/intels-knights-landing-coprocessor-detailed>
=> *72 cores*.

Nowadays it is not rare to have boxes with multicore CPU, either way if
they are not used because of some IO bottleneck there’s no reason to be
licensed for that, and if IO is not an issue the CPUs are most probably
next in line. While node is much more about a combination of that plus much
more added value like the linear scaling of Cassandra. And I’m not even
listing the other nifty integration that DSE ships in.

But on this matter I believe we shouldn’t hijack the original thread
purpose.

— Brice

On Wed, Apr 22, 2015 at 12:13 AM, Sebastian Estevez
[sebastian.este...@datastax.com](mailto:sebastian.este...@datastax.com)
<http://mailto:[sebastian.este...@datastax.com](mailto:sebastian.este...@datastax.com)>
wrote:

I want to draw a distinction between a) multithreaded compaction (the jira
> I just pointed to) and b) concurrent_compactors. I'm not clear on which one
> you are recommending at this stage.
>
> a) Multithreaded compaction is what I warned against in my last note. b)
> Concurrent compactors is the number of separate compaction tasks (on
> different tables) that can run simultaneously. You can crank this up
> without much risk though the old default of num cores was too aggressive
> (CASSANDRA-7139). 2 seems to be the sweet-spot.
>
> Cassandra is, more often than not, disk constrained though this can change
> for some workloads with SSD's.
>
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
> <http://cassandrasummit-datastax.com/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Tue, Apr 21, 2015 at 5:46 PM, Brice Dutheil <brice.duth...@gmail.com>
> wrote:
>
>> Oh, thank you Sebastian for this input and the ticket reference !
>> We did notice an increase in CPU usage, but kept the concurrent
>> compaction low enough for our usage, by default it takes the number of
>> cores. We did use a number up to 30% of our available cores. But under
>> heavy load clearly CPU is the bottleneck and we have 2 CPU with 8 hyper
>> threaded cores per node.
>>
>> In a related topic : I’m a bit concerned by datastax communication,
>> usually people talk about IO as being the weak spot, but in our case it’s
>> more about CPU. Fortunately the Moore law doesn’t really apply anymore
>> vertically, now we have have multi core processors *and* the trend is
>> going that way. Yet Datastax terms feels a bit *antiquated* and maybe a
>> bit too much Oracle-y : http://www.datastax.com/enterprise-terms
>> Node licensing is more appropriate for this century.
>> ​
>>
>> -- Brice
>>
>> On Tue, Apr 21, 2015 at 11:19 PM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> Do not enable multithreaded compaction. Overhead usually outweighs any
>>> benefit. It's removed in 2.1 because it harms more than helps:
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6142
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>>
>>> <http://cassandrasummit-datastax.com/>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Tue, Apr 21, 2015 at 9:06 AM, Brice Dutheil <brice.duth...@gmail.com>
>>> wrote:
>>>
>>>> I’m not sure I get everything about storm stuff, but my understanding
>>>> of LCS is that compaction count may increase the more one update data
>>>> (that’s why I was wondering about duplicate primary keys).
>>>>
>>>> Another option is that the code is sending too much write request/s to
>>>> the cassandra cluster. I don’t know haw many nodes you have, but the less
>>>> node there is the more compactions.
>>>> Also I’d look at the CPU / load, maybe the config is too *restrictive*,
>>>> look at the following properties in the cassandra.yaml
>>>>
>>>>    - compaction_throughput_mb_per_sec, by default the value is 16, you
>>>>    may want to increase it but be careful on mechanical drives, if already 
>>>> in
>>>>    SSD IO is rarely the issue, we have 64 (with SSDs)
>>>>    - multithreaded_compaction by default it is false, we enabled it.
>>>>
>>>> Compaction thread are niced, so it shouldn’t be much an issue for
>>>> serving production r/w requests. But you never know, always keep an eye on
>>>> IO and CPU.
>>>>
>>>> — Brice
>>>>
>>>> On Tue, Apr 21, 2015 at 2:48 PM, Anishek Agarwal <anis...@gmail.com>
>>>> wrote:
>>>>
>>>> sorry i take that back we will modify different keys across threads not
>>>>> the same key, our storm topology is going to use field grouping to get
>>>>> updates for same keys to same set of bolts.
>>>>>
>>>>> On Tue, Apr 21, 2015 at 6:17 PM, Anishek Agarwal <anis...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> @Bruice : I dont think so as i am giving each thread a specific key
>>>>>> range with no overlaps this does not seem to be the case now. However we
>>>>>> will have to test where we have to modify the same key across threads -- 
>>>>>> do
>>>>>> u think that will cause a problem ? As far as i have read LCS is
>>>>>> recommended for such cases. should i just switch back to
>>>>>> SizeTiredCompactionStrategy.
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 21, 2015 at 6:13 PM, Brice Dutheil <
>>>>>> brice.duth...@gmail.com> wrote:
>>>>>>
>>>>>>> Could it that the app is inserting _duplicate_ keys ?
>>>>>>>
>>>>>>> -- Brice
>>>>>>>
>>>>>>> On Tue, Apr 21, 2015 at 1:52 PM, Marcus Eriksson <krum...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> nope, but you can correlate I guess, tools/bin/sstablemetadata
>>>>>>>> gives you sstable level information
>>>>>>>>
>>>>>>>> and, it is also likely that since you get so many L0 sstables, you
>>>>>>>> will be doing size tiered compaction in L0 for a while.
>>>>>>>>
>>>>>>>> On Tue, Apr 21, 2015 at 1:40 PM, Anishek Agarwal <anis...@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> @Marcus I did look and that is where i got the above but it doesnt
>>>>>>>>> show any detail about moving from L0 -L1 any specific arguments i 
>>>>>>>>> should
>>>>>>>>> try with ?
>>>>>>>>>
>>>>>>>>> On Tue, Apr 21, 2015 at 4:52 PM, Marcus Eriksson <
>>>>>>>>> krum...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> you need to look at nodetool compactionstats - there is probably
>>>>>>>>>> a big L0 -> L1 compaction going on that blocks other compactions from
>>>>>>>>>> starting
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 21, 2015 at 1:06 PM, Anishek Agarwal <
>>>>>>>>>> anis...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> the "some_bits" column has about 14-15 bytes of data per key.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 21, 2015 at 4:34 PM, Anishek Agarwal <
>>>>>>>>>>> anis...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I am inserting about 100 million entries via datastax-java
>>>>>>>>>>>> driver to a cassandra cluster of 3 nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> Table structure is as
>>>>>>>>>>>>
>>>>>>>>>>>> create keyspace test with replication = {'class':
>>>>>>>>>>>> 'NetworkTopologyStrategy', 'DC' : 3};
>>>>>>>>>>>>
>>>>>>>>>>>> CREATE TABLE test_bits(id bigint primary key , some_bits text)
>>>>>>>>>>>> with gc_grace_seconds=0 and compaction = {'class':
>>>>>>>>>>>> 'LeveledCompactionStrategy'} and 
>>>>>>>>>>>> compression={'sstable_compression' : ''};
>>>>>>>>>>>>
>>>>>>>>>>>> have 75 threads that are inserting data into the above table
>>>>>>>>>>>> with each thread having non over lapping keys.
>>>>>>>>>>>>
>>>>>>>>>>>> I see that the number of pending tasks via "nodetool
>>>>>>>>>>>> compactionstats" keeps increasing and looks like from "nodetool 
>>>>>>>>>>>> cfstats
>>>>>>>>>>>> test.test_bits" has SSTTable levels as [154/4, 8, 0, 0, 0, 0, 0, 
>>>>>>>>>>>> 0, 0],
>>>>>>>>>>>>
>>>>>>>>>>>> Why is compaction not kicking in ?
>>>>>>>>>>>>
>>>>>>>>>>>> thanks
>>>>>>>>>>>> anishek
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>  ​
>>>>
>>>
>>>
>>
>  ​

Reply via email to