It seems like some regions are assigned to at least 2 region servers, so you will get different results if you connect to different region servers while trying to fetch the same row.
Usually disabling all the tables and then enabling them can fix the problem. And hbase-1.x has been EOL for about two years, please consider upgrading to recent hbase versions like 2.5.x and 2.6.x. Thanks. Roshan <[email protected]> 于2024年6月5日周三 14:08写道: > > Dear HBase Community, > > We are experiencing an intermittent issue in our HBase cluster (version > 1.4.14, HDFS 2.7.3, Zookeeper 3.4.10, 9 region servers, 2 masters). > > Issue Details: > > - Symptoms: Get operations intermittently return null for certain row > keys despite data presence. > - Duration: The issue persisted for two days and resolved on its own > without intervention. > - Timeline: > - Event: Full GC occurred on a region server. > - Action: Restarted the region server, leading to region assignment > issues. > - Troubleshooting: Used hbck fixAssignments but issues persisted. > Eventually restarted all region servers, stabilizing the cluster. > - Post-Stabilization: For two days, random Get queries returned null, > with no exceptions in region server, master, Zookeeper, or client logs. > - Resolution: Issue resolved itself after two days. > > Logs Reviewed: > > - Searched for keywords "WAL", "HLog", "flush", "replay", "corruption" > in region server and master logs. > - Checked Zookeeper logs for connectivity issues. > > Questions: > > 1. What could cause intermittent null returns despite data presence? > 2. Are there specific WAL or region server configurations to check? > 3. What additional logs or steps should we review? > > Any guidance would be appreciated. > > Regards, Roshan B
