As Al Tobey suggest me I upgraded my 2.1.0 to snaphot version of 2.1.3. I have now installed exactly this build: https://cassci.datastax.com/job/cassandra-2.1/912/ I see many compaction which completes, but some of them are really slow. Maybe I should send some stats form OpsCenter or servers? But it is difficult to me to choose what is important
Regards On Wed, Feb 18, 2015 at 6:11 PM, Jake Luciani <jak...@gmail.com> wrote: > Ja, Please upgrade to official 2.1.3 we've fixed many things related to > compaction. Are you seeing the compactions % complete progress at all? > > On Wed, Feb 18, 2015 at 11:58 AM, Roni Balthazar <ronibaltha...@gmail.com> > wrote: > >> Try repair -pr on all nodes. >> >> If after that you still have issues, you can try to rebuild the SSTables >> using nodetool upgradesstables or scrub. >> >> Regards, >> >> Roni Balthazar >> >> Em 18/02/2015, às 14:13, Ja Sam <ptrstp...@gmail.com> escreveu: >> >> ad 3) I did this already yesterday (setcompactionthrouput also). But >> still SSTables are increasing. >> >> ad 1) What do you think I should use -pr or try to use incremental? >> >> >> >> On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar <ronibaltha...@gmail.com> >> wrote: >> >>> You are right... Repair makes the data consistent between nodes. >>> >>> I understand that you have 2 issues going on. >>> >>> You need to run repair periodically without errors and need to decrease >>> the numbers of compactions pending. >>> >>> So I suggest: >>> >>> 1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can >>> use incremental repairs. There were some bugs on 2.1.2. >>> 2) Run cleanup on all nodes >>> 3) Since you have too many cold SSTables, set cold_reads_to_omit to >>> 0.0, and increase setcompactionthroughput for some time and see if the >>> number of SSTables is going down. >>> >>> Let us know what errors are you getting when running repairs. >>> >>> Regards, >>> >>> Roni Balthazar >>> >>> >>> On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam <ptrstp...@gmail.com> wrote: >>> >>>> Can you explain me what is the correlation between growing SSTables and >>>> repair? >>>> I was sure, until your mail, that repair is only to make data >>>> consistent between nodes. >>>> >>>> Regards >>>> >>>> >>>> On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar < >>>> ronibaltha...@gmail.com> wrote: >>>> >>>>> Which error are you getting when running repairs? >>>>> You need to run repair on your nodes within gc_grace_seconds (eg: >>>>> weekly). They have data that are not read frequently. You can run >>>>> "repair -pr" on all nodes. Since you do not have deletes, you will not >>>>> have trouble with that. If you have deletes, it's better to increase >>>>> gc_grace_seconds before the repair. >>>>> >>>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html >>>>> After repair, try to run a "nodetool cleanup". >>>>> >>>>> Check if the number of SSTables goes down after that... Pending >>>>> compactions must decrease as well... >>>>> >>>>> Cheers, >>>>> >>>>> Roni Balthazar >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam <ptrstp...@gmail.com> wrote: >>>>> > 1) we tried to run repairs but they usually does not succeed. But we >>>>> had >>>>> > Leveled compaction before. Last week we ALTER tables to STCS, >>>>> because guys >>>>> > from DataStax suggest us that we should not use Leveled and alter >>>>> tables in >>>>> > STCS, because we don't have SSD. After this change we did not run any >>>>> > repair. Anyway I don't think it will change anything in SSTable >>>>> count - if I >>>>> > am wrong please give me an information >>>>> > >>>>> > 2) I did this. My tables are 99% write only. It is audit system >>>>> > >>>>> > 3) Yes I am using default values >>>>> > >>>>> > 4) In both operations I am using LOCAL_QUORUM. >>>>> > >>>>> > I am almost sure that READ timeout happens because of too much >>>>> SSTables. >>>>> > Anyway firstly I would like to fix to many pending compactions. I >>>>> still >>>>> > don't know how to speed up them. >>>>> > >>>>> > >>>>> > On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar < >>>>> ronibaltha...@gmail.com> >>>>> > wrote: >>>>> >> >>>>> >> Are you running repairs within gc_grace_seconds? (default is 10 >>>>> days) >>>>> >> >>>>> >> >>>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html >>>>> >> >>>>> >> Double check if you set cold_reads_to_omit to 0.0 on tables with >>>>> STCS >>>>> >> that you do not read often. >>>>> >> >>>>> >> Are you using default values for the properties >>>>> >> min_compaction_threshold(4) and max_compaction_threshold(32)? >>>>> >> >>>>> >> Which Consistency Level are you using for reading operations? Check >>>>> if >>>>> >> you are not reading from DC_B due to your Replication Factor and CL. >>>>> >> >>>>> >> >>>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html >>>>> >> >>>>> >> >>>>> >> Cheers, >>>>> >> >>>>> >> Roni Balthazar >>>>> >> >>>>> >> On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam <ptrstp...@gmail.com> >>>>> wrote: >>>>> >> > I don't have problems with DC_B (replica) only in DC_A(my system >>>>> write >>>>> >> > only >>>>> >> > to it) I have read timeouts. >>>>> >> > >>>>> >> > I checked in OpsCenter SSTable count and I have: >>>>> >> > 1) in DC_A same +-10% for last week, a small increase for last >>>>> 24h (it >>>>> >> > is >>>>> >> > more than 15000-20000 SSTables depends on node) >>>>> >> > 2) in DC_B last 24h shows up to 50% decrease, which give nice >>>>> >> > prognostics. >>>>> >> > Now I have less then 1000 SSTables >>>>> >> > >>>>> >> > What did you measure during system optimizations? Or do you have >>>>> an idea >>>>> >> > what more should I check? >>>>> >> > 1) I look at CPU Idle (one node is 50% idle, rest 70% idle) >>>>> >> > 2) Disk queue -> mostly is it near zero: avg 0.09. Sometimes >>>>> there are >>>>> >> > spikes >>>>> >> > 3) system RAM usage is almost full >>>>> >> > 4) In Total Bytes Compacted most most lines are below 3MB/s. For >>>>> total >>>>> >> > DC_A >>>>> >> > it is less than 10MB/s, in DC_B it looks much better (avg is like >>>>> >> > 17MB/s) >>>>> >> > >>>>> >> > something else? >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar >>>>> >> > <ronibaltha...@gmail.com> >>>>> >> > wrote: >>>>> >> >> >>>>> >> >> Hi, >>>>> >> >> >>>>> >> >> You can check if the number of SSTables is decreasing. Look for >>>>> the >>>>> >> >> "SSTable count" information of your tables using "nodetool >>>>> cfstats". >>>>> >> >> The compaction history can be viewed using "nodetool >>>>> >> >> compactionhistory". >>>>> >> >> >>>>> >> >> About the timeouts, check this out: >>>>> >> >> >>>>> >> >> >>>>> http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure >>>>> >> >> Also try to run "nodetool tpstats" to see the threads >>>>> statistics. It >>>>> >> >> can lead you to know if you are having performance problems. If >>>>> you >>>>> >> >> are having too many pending tasks or dropped messages, maybe >>>>> will you >>>>> >> >> need to tune your system (eg: driver's timeout, concurrent reads >>>>> and >>>>> >> >> so on) >>>>> >> >> >>>>> >> >> Regards, >>>>> >> >> >>>>> >> >> Roni Balthazar >>>>> >> >> >>>>> >> >> On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam <ptrstp...@gmail.com> >>>>> wrote: >>>>> >> >> > Hi, >>>>> >> >> > Thanks for your "tip" it looks that something changed - I >>>>> still don't >>>>> >> >> > know >>>>> >> >> > if it is ok. >>>>> >> >> > >>>>> >> >> > My nodes started to do more compaction, but it looks that some >>>>> >> >> > compactions >>>>> >> >> > are really slow. >>>>> >> >> > In IO we have idle, CPU is quite ok (30%-40%). We set >>>>> >> >> > compactionthrouput >>>>> >> >> > to >>>>> >> >> > 999, but I do not see difference. >>>>> >> >> > >>>>> >> >> > Can we check something more? Or do you have any method to >>>>> monitor >>>>> >> >> > progress >>>>> >> >> > with small files? >>>>> >> >> > >>>>> >> >> > Regards >>>>> >> >> > >>>>> >> >> > On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar >>>>> >> >> > <ronibaltha...@gmail.com> >>>>> >> >> > wrote: >>>>> >> >> >> >>>>> >> >> >> HI, >>>>> >> >> >> >>>>> >> >> >> Yes... I had the same issue and setting cold_reads_to_omit to >>>>> 0.0 >>>>> >> >> >> was >>>>> >> >> >> the solution... >>>>> >> >> >> The number of SSTables decreased from many thousands to a >>>>> number >>>>> >> >> >> below >>>>> >> >> >> a hundred and the SSTables are now much bigger with several >>>>> >> >> >> gigabytes >>>>> >> >> >> (most of them). >>>>> >> >> >> >>>>> >> >> >> Cheers, >>>>> >> >> >> >>>>> >> >> >> Roni Balthazar >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam <ptrstp...@gmail.com >>>>> > >>>>> >> >> >> wrote: >>>>> >> >> >> > After some diagnostic ( we didn't set yet >>>>> cold_reads_to_omit ). >>>>> >> >> >> > Compaction >>>>> >> >> >> > are running but VERY slow with "idle" IO. >>>>> >> >> >> > >>>>> >> >> >> > We had a lot of "Data files" in Cassandra. In DC_A it is >>>>> about >>>>> >> >> >> > ~120000 >>>>> >> >> >> > (only >>>>> >> >> >> > xxx-Data.db) in DC_B has only ~4000. >>>>> >> >> >> > >>>>> >> >> >> > I don't know if this change anything but: >>>>> >> >> >> > 1) in DC_A avg size of Data.db file is ~13 mb. I have few a >>>>> really >>>>> >> >> >> > big >>>>> >> >> >> > ones, >>>>> >> >> >> > but most is really small (almost 10000 files are less then >>>>> 100mb). >>>>> >> >> >> > 2) in DC_B avg size of Data.db is much bigger ~260mb. >>>>> >> >> >> > >>>>> >> >> >> > Do you think that above flag will help us? >>>>> >> >> >> > >>>>> >> >> >> > >>>>> >> >> >> > On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam < >>>>> ptrstp...@gmail.com> >>>>> >> >> >> > wrote: >>>>> >> >> >> >> >>>>> >> >> >> >> I set setcompactionthroughput 999 permanently and it >>>>> doesn't >>>>> >> >> >> >> change >>>>> >> >> >> >> anything. IO is still same. CPU is idle. >>>>> >> >> >> >> >>>>> >> >> >> >> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar >>>>> >> >> >> >> <ronibaltha...@gmail.com> >>>>> >> >> >> >> wrote: >>>>> >> >> >> >>> >>>>> >> >> >> >>> Hi, >>>>> >> >> >> >>> >>>>> >> >> >> >>> You can run "nodetool compactionstats" to view statistics >>>>> on >>>>> >> >> >> >>> compactions. >>>>> >> >> >> >>> Setting cold_reads_to_omit to 0.0 can help to reduce the >>>>> number >>>>> >> >> >> >>> of >>>>> >> >> >> >>> SSTables when you use Size-Tiered compaction. >>>>> >> >> >> >>> You can also create a cron job to increase the value of >>>>> >> >> >> >>> setcompactionthroughput during the night or when your IO >>>>> is not >>>>> >> >> >> >>> busy. >>>>> >> >> >> >>> >>>>> >> >> >> >>> From http://wiki.apache.org/cassandra/NodeTool: >>>>> >> >> >> >>> 0 0 * * * root nodetool -h `hostname` >>>>> setcompactionthroughput >>>>> >> >> >> >>> 999 >>>>> >> >> >> >>> 0 6 * * * root nodetool -h `hostname` >>>>> setcompactionthroughput 16 >>>>> >> >> >> >>> >>>>> >> >> >> >>> Cheers, >>>>> >> >> >> >>> >>>>> >> >> >> >>> Roni Balthazar >>>>> >> >> >> >>> >>>>> >> >> >> >>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam < >>>>> ptrstp...@gmail.com> >>>>> >> >> >> >>> wrote: >>>>> >> >> >> >>> > One think I do not understand. In my case compaction is >>>>> >> >> >> >>> > running >>>>> >> >> >> >>> > permanently. >>>>> >> >> >> >>> > Is there a way to check which compaction is pending? >>>>> The only >>>>> >> >> >> >>> > information is >>>>> >> >> >> >>> > about total count. >>>>> >> >> >> >>> > >>>>> >> >> >> >>> > >>>>> >> >> >> >>> > On Monday, February 16, 2015, Ja Sam < >>>>> ptrstp...@gmail.com> >>>>> >> >> >> >>> > wrote: >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> Of couse I made a mistake. I am using 2.1.2. Anyway >>>>> night >>>>> >> >> >> >>> >> build >>>>> >> >> >> >>> >> is >>>>> >> >> >> >>> >> available from >>>>> >> >> >> >>> >> http://cassci.datastax.com/job/cassandra-2.1/ >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> I read about cold_reads_to_omit It looks promising. >>>>> Should I >>>>> >> >> >> >>> >> set >>>>> >> >> >> >>> >> also >>>>> >> >> >> >>> >> compaction throughput? >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> p.s. I am really sad that I didn't read this before: >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> >>>>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> >>>>> >> >> >> >>> >> On Monday, February 16, 2015, Carlos Rolo < >>>>> r...@pythian.com> >>>>> >> >> >> >>> >> wrote: >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> Hi 100% in agreement with Roland, >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> 2.1.x series is a pain! I would never recommend the >>>>> current >>>>> >> >> >> >>> >>> 2.1.x >>>>> >> >> >> >>> >>> series >>>>> >> >> >> >>> >>> for production. >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> Clocks is a pain, and check your connectivity! Also >>>>> check >>>>> >> >> >> >>> >>> tpstats >>>>> >> >> >> >>> >>> to >>>>> >> >> >> >>> >>> see >>>>> >> >> >> >>> >>> if your threadpools are being overrun. >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> Regards, >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> Carlos Juzarte Rolo >>>>> >> >> >> >>> >>> Cassandra Consultant >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> Pythian - Love your data >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> rolo@pythian | Twitter: cjrolo | Linkedin: >>>>> >> >> >> >>> >>> linkedin.com/in/carlosjuzarterolo >>>>> >> >> >> >>> >>> Tel: 1649 >>>>> >> >> >> >>> >>> www.pythian.com >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer >>>>> >> >> >> >>> >>> <r.etzenham...@t-online.de> wrote: >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> Hi, >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> 1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 >>>>> >> >> >> >>> >>>> (suggested >>>>> >> >> >> >>> >>>> by >>>>> >> >> >> >>> >>>> Al >>>>> >> >> >> >>> >>>> Tobey from DataStax) >>>>> >> >> >> >>> >>>> 7) minimal reads (usually none, sometimes few) >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> those two points keep me repeating an anwser I got. >>>>> First >>>>> >> >> >> >>> >>>> where >>>>> >> >> >> >>> >>>> did >>>>> >> >> >> >>> >>>> you >>>>> >> >> >> >>> >>>> get 2.1.3 from? Maybe I missed it, I will have a >>>>> look. But >>>>> >> >> >> >>> >>>> if >>>>> >> >> >> >>> >>>> it >>>>> >> >> >> >>> >>>> is >>>>> >> >> >> >>> >>>> 2.1.2 >>>>> >> >> >> >>> >>>> whis is the latest released version, that version >>>>> has many >>>>> >> >> >> >>> >>>> bugs - >>>>> >> >> >> >>> >>>> most of >>>>> >> >> >> >>> >>>> them I got kicked by while testing 2.1.2. I got many >>>>> >> >> >> >>> >>>> problems >>>>> >> >> >> >>> >>>> with >>>>> >> >> >> >>> >>>> compactions not beeing triggred on column families >>>>> not >>>>> >> >> >> >>> >>>> beeing >>>>> >> >> >> >>> >>>> read, >>>>> >> >> >> >>> >>>> compactions and repairs not beeing completed. See >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1 >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> >>>>> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> Apart from that, how are those both datacenters >>>>> connected? >>>>> >> >> >> >>> >>>> Maybe >>>>> >> >> >> >>> >>>> there >>>>> >> >> >> >>> >>>> is a bottleneck. >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> Also do you have ntp up and running on all nodes to >>>>> keep >>>>> >> >> >> >>> >>>> all >>>>> >> >> >> >>> >>>> clocks >>>>> >> >> >> >>> >>>> in >>>>> >> >> >> >>> >>>> thight sync? >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> Note: I'm no expert (yet) - just sharing my 2 cents. >>>>> >> >> >> >>> >>>> >>>>> >> >> >> >>> >>>> Cheers, >>>>> >> >> >> >>> >>>> Roland >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> -- >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> >>> >>>>> >> >> >> >>> > >>>>> >> >> >> >> >>>>> >> >> >> >> >>>>> >> >> >> > >>>>> >> >> > >>>>> >> >> > >>>>> >> > >>>>> >> > >>>>> > >>>>> > >>>>> >>>> >>>> >>> >> > > > -- > http://twitter.com/tjake >