I was thinking about that option and I would be curious to find out how does this change help you. I suspected that increasing sstable size won't help too much because the compaction throughput (per task/thread) is still the same. So, it will simply take 4x longer to finish a compaction task. It is possible that because of that the CPU will be under-used for even longer.
My data model, unfortunately, requires this amount of data. And I suspect that regardless of how it is organized I won't be able to optimize it - I do need these rows to be in one row so I can read them quickly. One of the obvious recommendations I have received was to run more than one instance of C* per host. Makes sense - it will reduce the amount of data per node and will make better use of the resources. I would go for it myself, but it may be a challenge for the people in operations. Without a VM this would be more tricky for them to operate such a thing and I do not want any VMs there. Another option is to probably simply shard my data between several identical tables in the same keyspace. I could also think about different keyspaces but I prefer not to spread the data for the same logical "tenant" across multiple keyspaces. Use my primary key's hash and then simply do something like mod 4 and add this to the table name :) This would effectively reduce the number of sstables and amount of data per table (CF). I kind of like this idea more - yes, a bit more challenge at coding level but obvious benefits without extra operational complexity. On Mon, Nov 24, 2014 at 9:32 AM, Andrei Ivanov <aiva...@iponweb.net> wrote: > Nikolai, > > This is more or less what I'm seeing on my cluster then. Trying to > switch to bigger sstables right now (1Gb) > > On Mon, Nov 24, 2014 at 5:18 PM, Nikolai Grigoriev <ngrigor...@gmail.com> > wrote: > > Andrei, > > > > Oh, Monday mornings...Tb :) > > > > On Mon, Nov 24, 2014 at 9:12 AM, Andrei Ivanov <aiva...@iponweb.net> > wrote: > >> > >> Nikolai, > >> > >> Are you sure about 1.26Gb? Like it doesn't look right - 5195 tables > >> with 256Mb table size... > >> > >> Andrei > >> > >> On Mon, Nov 24, 2014 at 5:09 PM, Nikolai Grigoriev < > ngrigor...@gmail.com> > >> wrote: > >> > Jean-Armel, > >> > > >> > I have only two large tables, the rest is super-small. In the test > >> > cluster > >> > of 15 nodes the largest table has about 110M rows. Its total size is > >> > about > >> > 1,26Gb per node (total disk space used per node for that CF). It's got > >> > about > >> > 5K sstables per node - the sstable size is 256Mb. cfstats on a > "healthy" > >> > node look like this: > >> > > >> > Read Count: 8973748 > >> > Read Latency: 16.130059053251774 ms. > >> > Write Count: 32099455 > >> > Write Latency: 1.6124713938912671 ms. > >> > Pending Tasks: 0 > >> > Table: wm_contacts > >> > SSTable count: 5195 > >> > SSTables in each level: [27/4, 11/10, 104/100, 1053/1000, > 4000, > >> > 0, > >> > 0, 0, 0] > >> > Space used (live), bytes: 1266060391852 > >> > Space used (total), bytes: 1266144170869 > >> > SSTable Compression Ratio: 0.32604853410787327 > >> > Number of keys (estimate): 25696000 > >> > Memtable cell count: 71402 > >> > Memtable data size, bytes: 26938402 > >> > Memtable switch count: 9489 > >> > Local read count: 8973748 > >> > Local read latency: 17.696 ms > >> > Local write count: 32099471 > >> > Local write latency: 1.732 ms > >> > Pending tasks: 0 > >> > Bloom filter false positives: 32248 > >> > Bloom filter false ratio: 0.50685 > >> > Bloom filter space used, bytes: 20744432 > >> > Compacted partition minimum bytes: 104 > >> > Compacted partition maximum bytes: 3379391 > >> > Compacted partition mean bytes: 172660 > >> > Average live cells per slice (last five minutes): 495.0 > >> > Average tombstones per slice (last five minutes): 0.0 > >> > > >> > Another table of similar structure (same number of rows) is about 4x > >> > times > >> > smaller. That table does not suffer from those issues - it compacts > well > >> > and > >> > efficiently. > >> > > >> > On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce <jaluc...@gmail.com> > >> > wrote: > >> >> > >> >> Hi Nikolai, > >> >> > >> >> Please could you clarify a little bit what you call "a large amount > of > >> >> data" ? > >> >> > >> >> How many tables ? > >> >> How many rows in your largest table ? > >> >> How many GB in your largest table ? > >> >> How many GB per node ? > >> >> > >> >> Thanks. > >> >> > >> >> > >> >> > >> >> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce <jaluc...@gmail.com>: > >> >>> > >> >>> Hi Nikolai, > >> >>> > >> >>> Thanks for those informations. > >> >>> > >> >>> Please could you clarify a little bit what you call " > >> >>> > >> >>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev <ngrigor...@gmail.com>: > >> >>>> > >> >>>> Just to clarify - when I was talking about the large amount of > data I > >> >>>> really meant large amount of data per node in a single CF (table). > >> >>>> LCS does > >> >>>> not seem to like it when it gets thousands of sstables (makes 4-5 > >> >>>> levels). > >> >>>> > >> >>>> When bootstraping a new node you'd better enable that option from > >> >>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will > still > >> >>>> be a > >> >>>> mess - I have a node that I have bootstrapped ~2 weeks ago. > Initially > >> >>>> it had > >> >>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K. > Does > >> >>>> not go > >> >>>> down. Number of sstables at L0 is over 11K and it is slowly slowly > >> >>>> building > >> >>>> upper levels. Total number of sstables is 4x the normal amount. > Now I > >> >>>> am not > >> >>>> entirely sure if this node will ever get back to normal life. And > >> >>>> believe me > >> >>>> - this is not because of I/O, I have SSDs everywhere and 16 > physical > >> >>>> cores. > >> >>>> This machine is barely using 1-3 cores at most of the time. The > >> >>>> problem is > >> >>>> that allowing STCS fallback is not a good option either - it will > >> >>>> quickly > >> >>>> result in a few 200Gb+ sstables in my configuration and then these > >> >>>> sstables > >> >>>> will never be compacted. Plus, it will require close to 2x disk > space > >> >>>> on > >> >>>> EVERY disk in my JBOD configuration...this will kill the node > sooner > >> >>>> or > >> >>>> later. This is all because all sstables after bootstrap end at L0 > and > >> >>>> then > >> >>>> the process slowly slowly moves them to other levels. If you have > >> >>>> write > >> >>>> traffic to that CF then the number of sstables and L0 will grow > >> >>>> quickly - > >> >>>> like it happens in my case now. > >> >>>> > >> >>>> Once something like > >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-8301 > >> >>>> is implemented it may be better. > >> >>>> > >> >>>> > >> >>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov < > aiva...@iponweb.net> > >> >>>> wrote: > >> >>>>> > >> >>>>> Stephane, > >> >>>>> > >> >>>>> We are having a somewhat similar C* load profile. Hence some > >> >>>>> comments > >> >>>>> in addition Nikolai's answer. > >> >>>>> 1. Fallback to STCS - you can disable it actually > >> >>>>> 2. Based on our experience, if you have a lot of data per node, > LCS > >> >>>>> may work just fine. That is, till the moment you decide to join > >> >>>>> another node - chances are that the newly added node will not be > >> >>>>> able > >> >>>>> to compact what it gets from old nodes. In your case, if you > switch > >> >>>>> strategy the same thing may happen. This is all due to limitations > >> >>>>> mentioned by Nikolai. > >> >>>>> > >> >>>>> Andrei, > >> >>>>> > >> >>>>> > >> >>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. > >> >>>>> <smg...@gmail.com> > >> >>>>> wrote: > >> >>>>> > ABUSE > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] > >> >>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m. > >> >>>>> > Para: user@cassandra.apache.org > >> >>>>> > Asunto: Re: Compaction Strategy guidance > >> >>>>> > Importancia: Alta > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > Stephane, > >> >>>>> > > >> >>>>> > As everything good, LCS comes at certain price. > >> >>>>> > > >> >>>>> > LCS will put most load on you I/O system (if you use spindles - > >> >>>>> > you > >> >>>>> > may need > >> >>>>> > to be careful about that) and on CPU. Also LCS (by default) may > >> >>>>> > fall > >> >>>>> > back to > >> >>>>> > STCS if it is falling behind (which is very possible with heavy > >> >>>>> > writing > >> >>>>> > activity) and this will result in higher disk space usage. Also > >> >>>>> > LCS > >> >>>>> > has > >> >>>>> > certain limitation I have discovered lately. Sometimes LCS may > not > >> >>>>> > be > >> >>>>> > able > >> >>>>> > to use all your node's resources (algorithm limitations) and > this > >> >>>>> > reduces > >> >>>>> > the overall compaction throughput. This may happen if you have a > >> >>>>> > large > >> >>>>> > column family with lots of data per node. STCS won't have this > >> >>>>> > limitation. > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > By the way, the primary goal of LCS is to reduce the number of > >> >>>>> > sstables C* > >> >>>>> > has to look at to find your data. With LCS properly functioning > >> >>>>> > this > >> >>>>> > number > >> >>>>> > will be most likely between something like 1 and 3 for most of > the > >> >>>>> > reads. > >> >>>>> > But if you do few reads and not concerned about the latency > today, > >> >>>>> > most > >> >>>>> > likely LCS may only save you some disk space. > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay > >> >>>>> > <sle...@looplogic.com> > >> >>>>> > wrote: > >> >>>>> > > >> >>>>> > Hi there, > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > use case: > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > - Heavy write app, few reads. > >> >>>>> > > >> >>>>> > - Lots of updates of rows / columns. > >> >>>>> > > >> >>>>> > - Current performance is fine, for both writes and reads.. > >> >>>>> > > >> >>>>> > - Currently using SizedCompactionStrategy > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > We're trying to limit the amount of storage used during > >> >>>>> > compaction. > >> >>>>> > Should > >> >>>>> > we switch to LeveledCompactionStrategy? > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > Thanks > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > > >> >>>>> > -- > >> >>>>> > > >> >>>>> > Nikolai Grigoriev > >> >>>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> Nikolai Grigoriev > >> >>>> > >> >>> > >> >> > >> > > >> > > >> > > >> > -- > >> > Nikolai Grigoriev > >> > > > > > > > > > > > -- > > Nikolai Grigoriev > > (514) 772-5178 > -- Nikolai Grigoriev (514) 772-5178