Thanks guys for going through that never-ending email! I will create the
JIRA for block cache eviction and the regionserver assignment command. Ted
already pointed to the JIRA which tries to go a different datanode if the
primary is busy (I will add comments to that one).

To answer Andrews' questions:

- I am using HBase 0.94.4
- I tried taking a stack trace using jstack but after the dump it crashed
the regionserver. I also did not take the dump on the offending
regionserver, rather took it on the regionservers that were making the
block count. I will take a stack trace on the offending server. Is there
any other tool besides jstack ? I don't want to crash my regionserver.
- The HBase clients workload is fairly random and I write to a table every
4-5 seconds. I have varying workloads for different tables. But I do a lot
of batching on the client side and group similar rowkeys together before
doing a GET/PUT. For example: best case I end up doing ~100 puts every
second to a region or in the worst case it's ~5K puts every second. But
again since the workload is fairly random. Currently the clients for the
table which had the most amount of data has been disabled and yet I see the
heavy loads.

To answer Vladimir's points:
- Data access pattern definitely turns out to be uniform over a period of
time.
- I just did a sweep of my code base and found that there are a few places
where Scanner are using block cache. I will disable that and see how it
goes.

Thanks,
Viral

Reply via email to