Hi all,

We're running into a few Kudu issues with the first being the Kudu cluster 
check utility (sudo -u kudu /opt/cloudera/parcels/CDH/lib/kudu/bin-debug/kudu 
cluster ksck) showing:

Connected to the Master
Fetched info from all 10 Tablet Servers

Tablet 41bf41e4127a46c69242f707298cf4ba of table 'xxx' is under-replicated: 1 
replica(s) not RUNNING
  1b3d49dd6ce64acda32f97a89d7de193: TS unavailable
  1a05af887edf4ba7b5c1731ce3508b19 (pdn05:7050): RUNNING [LEADER]
  4028533287964369928034c3616a0a16 (pdn01:7050): RUNNING

2 replicas' active configs differ from the master's.
  All the peers reported by the master and tablet servers are:
  A = 1a05af887edf4ba7b5c1731ce3508b19
  B = 1b3d49dd6ce64acda32f97a89d7de193
  C = 4028533287964369928034c3616a0a16

The consensus matrix is:
Segmentation fault

There is some mention of segmentation fault in combination with ksck in the 
Kudu release notes for 1.4.0, but we are running 1.5.0 on a CDH cluster.

Some notes:


  *   All masters (we have 3) are up with one leader being elected
  *   All tablet servers (10) are live and visible in the master web UI
  *   We've ran kudu fs check ... -repair on all servers (master & tablet)
  *   Master logs are filled with errors like:

Previously reported cstate for tablet 5977f01cea44448a908bb56f97b46d9e (table 
'xxx' [id=bb359f4b89dd46e797e2e24f9efac971]) gave a different leader for term 
2007 than the current cstate. Previous cstate: current_term: 2007 leader_uuid: 
""

  *   And tablet server logs contain a lot of:

Couldn't send request to peer 228515616baf44a99561c2b72dfb3bab for tablet 
138854a04f804f4ebf42df657c22b995. Error code: TABLET_NOT_RUNNING (12). Status: 
Illegal state: Tablet not RUNNING: INITIALIZED. Retrying in the next heartbeat 
period. Already tried 12813 times.

We're a bit lost as to where to look next.

If anyone can point us in the right direction, that would be great!

Thanks,

Vincent

Reply via email to