Re: Possible problem with disk latency

daemeon reiydelle Wed, 25 Feb 2015 11:15:13 -0800

I think you may have a vicious circle of errors: because your data is not
properly replicated to the neighbour, it is not replicating to the
secondary data center (yeah, obvious). I would suspect the GC errors are
(also obviously) the result of a backlog of compactions that take out the
neighbour (assuming replication of 3, that means each "neighbour" is
participating in compaction from at least one other node besides the
primary you are looking at (and can of course be much more, depending on
e.g. vnode count if used).


What happens is that when a node fails due to a GC error (can't reclaim
space), that causes a cascade of other errors, as you see. Might I suggest
you have someone in devops with monitoring experience install a monitoring
tool that will notify you of EVERY SINGLE java GC failure event? Your
DevOps team may have a favorite log shipping/monitoring tool, could use
e.g. Puppet

I think you may have to go through a MANUAL, table by table compaction.





*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam <ptrstp...@gmail.com> wrote:

> Hi Roni,
> The repair results is following (we run it Friday): Cannot proceed on
> repair because a neighbor (/192.168.61.201) is dead: session failed
>
> But to be honest the neighbor did not died. It seemed to trigger a series
> of full GC events on the initiating node. The results form logs are:
>
> [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges
> for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false)
> [2015-02-21 02:21:55,640] Lost notification. You should check server log
> for repair status of keyspace prem_maelstrom_2
> [2015-02-21 02:22:55,642] Lost notification. You should check server log
> for repair status of keyspace prem_maelstrom_2
> [2015-02-21 02:23:55,642] Lost notification. You should check server log
> for repair status of keyspace prem_maelstrom_2
> [2015-02-21 02:24:55,644] Lost notification. You should check server log
> for repair status of keyspace prem_maelstrom_2
> [2015-02-21 04:41:08,607] Repair session
> d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range
> (85070591730234615865843651857942052874,102084710076281535261119195933814292480]
> failed with error org.apache.cassandra.exceptions.RepairException: [repair
> #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events,
> (85070591730234615865843651857942052874,102084710076281535261119195933814292480]]
> Sync failed between /192.168.71.196 and /192.168.61.199
> [2015-02-21 04:41:08,608] Repair session
> eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range
> (68056473384187696470568107782069813248,85070591730234615865843651857942052874]
> failed with error java.io.IOException: Endpoint /192.168.61.199 died
> [2015-02-21 04:41:08,608] Repair session
> c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error
> java.io.IOException: Cannot proceed on repair because a neighbor (/
> 192.168.61.201) is dead: session failed
> [2015-02-21 04:41:08,609] Repair session
> c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range
> (42535295865117307932921825928971026442,68056473384187696470568107782069813248]
> failed with error java.io.IOException: Cannot proceed on repair because a
> neighbor (/192.168.61.201) is dead: session failed
> [2015-02-21 04:41:08,609] Repair session
> c48d38f1-b971-11e4-bc97-e9a66e5b2124 for range
> (127605887595351923798765477786913079306,136112946768375392941136215564139626496]
> failed with error java.io.IOException: Cannot proceed on repair because a
> neighbor (/192.168.61.201) is dead: session failed
> [2015-02-21 04:41:08,619] Repair session
> c48d6000-b971-11e4-bc97-e9a66e5b2124 for range
> (136112946768375392941136215564139626496,0] failed with error
> java.io.IOException: Cannot proceed on repair because a neighbor (/
> 192.168.61.201) is dead: session failed
> [2015-02-21 04:41:08,620] Repair session
> c48d6001-b971-11e4-bc97-e9a66e5b2124 for range
> (102084710076281535261119195933814292480,127605887595351923798765477786913079306]
> failed with error java.io.IOException: Cannot proceed on repair because a
> neighbor (/192.168.61.201) is dead: session failed
> [2015-02-21 04:41:08,620] Repair command #2 finished
>
>
> We tried to run repair one more time. After 24 hour have some streaming
> errors. Moreover, 2-3 hours later, we have to stop it because we start to
> have write timeouts on client and our system starts to dying.
> The iostats from "dying" time plus tpstats are available here:
> https://drive.google.com/file/d/0B4N_AbBPGGwLc25nU0lnY3Z5NDA/view
>
>
>
> On Wed, Feb 25, 2015 at 7:50 PM, Roni Balthazar <ronibaltha...@gmail.com>
> wrote:
>
>> Hi Piotr,
>>
>> Are your repairs finishing without errors?
>>
>> Regards,
>>
>> Roni Balthazar
>>
>> On 25 February 2015 at 15:43, Ja Sam <ptrstp...@gmail.com> wrote:
>> > Hi, Roni,
>> > They aren't exactly balanced but as I wrote before they are in range
>> from
>> > 2500-6000.
>> > If you need exactly data I will check them tomorrow morning. But all
>> nodes
>> > in AGRAF have small increase of pending compactions during last week,
>> which
>> > is "wrong direction"
>> >
>> > I will check in the morning get compaction throuput, but my feeling
>> about
>> > this parameter is that it doesn't change anything.
>> >
>> > Regards
>> > Piotr
>> >
>> >
>> >
>> >
>> > On Wed, Feb 25, 2015 at 7:34 PM, Roni Balthazar <
>> ronibaltha...@gmail.com>
>> > wrote:
>> >>
>> >> Hi Piotr,
>> >>
>> >> What about the nodes on AGRAF? Are the pending tasks balanced between
>> >> this DC nodes as well?
>> >> You can check the pending compactions on each node.
>> >>
>> >> Also try to run "nodetool getcompactionthroughput" on all nodes and
>> >> check if the compaction throughput is set to 999.
>> >>
>> >> Cheers,
>> >>
>> >> Roni Balthazar
>> >>
>> >> On 25 February 2015 at 14:47, Ja Sam <ptrstp...@gmail.com> wrote:
>> >> > Hi Roni,
>> >> >
>> >> > It is not balanced. As I wrote you last week I have problems only in
>> DC
>> >> > in
>> >> > which we writes (on screen it is named as AGRAF:
>> >> > https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view).
>> The
>> >> > problem is on ALL nodes in this dc.
>> >> > In second DC (ZETO) only one node have more than 30 SSTables and
>> pending
>> >> > compactions are decreasing to zero.
>> >> >
>> >> > In AGRAF the minimum pending compaction is 2500 , maximum is 6000
>> (avg
>> >> > on
>> >> > screen from opscenter is less then 5000)
>> >> >
>> >> >
>> >> > Regards
>> >> > Piotrek.
>> >> >
>> >> > p.s. I don't know why my mail client display my name as Ja Sam
>> instead
>> >> > of
>> >> > Piotr Stapp, but this doesn't change anything :)
>> >> >
>> >> >
>> >> > On Wed, Feb 25, 2015 at 5:45 PM, Roni Balthazar
>> >> > <ronibaltha...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Ja,
>> >> >>
>> >> >> How are the pending compactions distributed between the nodes?
>> >> >> Run "nodetool compactionstats" on all of your nodes and check if the
>> >> >> pendings tasks are balanced or they are concentrated in only few
>> >> >> nodes.
>> >> >> You also can check the if the SSTable count is balanced running
>> >> >> "nodetool cfstats" on your nodes.
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> Roni Balthazar
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 25 February 2015 at 13:29, Ja Sam <ptrstp...@gmail.com> wrote:
>> >> >> > I do NOT have SSD. I have normal HDD group by JBOD.
>> >> >> > My CF have SizeTieredCompactionStrategy
>> >> >> > I am using local quorum for reads and writes. To be precise I
>> have a
>> >> >> > lot
>> >> >> > of
>> >> >> > writes and almost 0 reads.
>> >> >> > I changed "cold_reads_to_omit" to 0.0 as someone suggest me. I
>> used
>> >> >> > set
>> >> >> > compactionthrouput to 999.
>> >> >> >
>> >> >> > So if my disk are idle, my CPU is less then 40%, I have some free
>> RAM
>> >> >> > -
>> >> >> > why
>> >> >> > SSTables count is growing? How I can speed up compactions?
>> >> >> >
>> >> >> > On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall <
>> n...@thelastpickle.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >>>
>> >> >> >>> If You could be so kind and validate above and give me an
>> answer is
>> >> >> >>> my
>> >> >> >>> disk are real problems or not? And give me a tip what should I
>> do
>> >> >> >>> with
>> >> >> >>> above
>> >> >> >>> cluster? Maybe I have misconfiguration?
>> >> >> >>>
>> >> >> >>>
>> >> >> >>
>> >> >> >> You disks are effectively idle. What consistency level are you
>> using
>> >> >> >> for
>> >> >> >> reads and writes?
>> >> >> >>
>> >> >> >> Actually, 'await' is sort of weirdly high for idle SSDs. Check
>> your
>> >> >> >> interrupt mappings (cat /proc/interrupts) and make sure the
>> >> >> >> interrupts
>> >> >> >> are
>> >> >> >> not being stacked on a single CPU.
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>
>

Re: Possible problem with disk latency

Reply via email to