Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Anoop John
Hi Saad In your initial mail you mentioned that there are lots of checkAndPut ops but on different rows. The failure in obtaining locks (write lock as it is checkAndPut) means there is contention on the same row key. If that is the case , ya that is the 1st step before BC reads and

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-10 Thread Saad Mufti
The question remain though of why it is even accessing a column family's files that should be excluded based on the Scan. And that column family does NOT specify prefetch on open in its schema. Only the one we want to read specifies prefetch on open, which we want to override if possible for the

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-10 Thread Saad Mufti
See below more I found on item 3. Cheers. Saad On Sat, Mar 10, 2018 at 7:17 PM, Saad Mufti wrote: > Hi, > > I am running a Spark job (Spark 2.2.1) on an EMR cluster in AWS. There is > no Hbase installed on the cluster, only HBase libs linked to my Spark app. > We

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
Although now that I think about this a bit more, all the failures we saw were failure to obtain a row lock, and in the thread stack traces we always saw it somewhere inside getRowLockInternal and similar. Never saw any contention on bucket cache lock that I could see. Cheers. Saad On Sat,

Re: HBase failed on local exception and failed servers list.

2018-03-10 Thread Saad Mufti
Are you using AuthUtil class to reauthenticate? This class is in Hbase, and uses the Hadoop class UserGroupInformation to do the actual login and re-login. But, if your UserGroupInformation class is from Hadoop 2.5.1 or earlier, it has a bug if you are using Java 8, as most of us are. The relogin

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
Also, for now we have mitigated this problem by using the new setting in HBase 1.4.0 that prevents one slow region server from blocking all client requests. Of course it causes some timeouts but our overall ecosystem contains Kafka queues for retries, so we can live with that. From what I can see,

TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-10 Thread Saad Mufti
Hi, I am running a Spark job (Spark 2.2.1) on an EMR cluster in AWS. There is no Hbase installed on the cluster, only HBase libs linked to my Spark app. We are reading the snapshot info from a HBase folder in S3 using TableSnapshotInputFormat class from HBase 1.4.0 to have the Spark job read

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
So if I understand correctly, we would mitigate the problem by not evicting blocks for archived files immediately? Wouldn't this potentially lead to problems later if the LRU algo chooses to evict blocks for active files and leave blocks for archived files in there? I would definitely love to