Jacques: I think you got me wrong on my statement. I was only requesting
you to think again about my questions assuming that I have seen the jive
video, since there are some differences in our case compared to jive. I
completely understand that all this is voluntary effort and my sincere
thanks
Thanks Eugeny. We are currently running some experiments based on your
suggestions!
On Thu, Oct 4, 2012 at 2:20 AM, Eugeny Morozov emoro...@griddynamics.comwrote:
I'd suggest to think about manual major compactions and splits. Using
manual compactions and bulkload allows to split HFiles
I would suggest you watch this video:
http://www.cloudera.com/resource/video-hbasecon-2012-real-performance-gains-with-real-time-data/
The jive guys solved a lot of the problems you're talking about and discuss
it in that case study.
On Wed, Oct 3, 2012 at 6:27 AM, Karthikeyan Muthukumarasamy
Hi Jacques,
Thanks for the response!
Yes, I have seen the video before. It suggets usage of TTL based retention
implementation. In their usecase, Jive has a fixed retention say 3 months
and so they can pre-create regions for so many buckets, their bucket id is
DAY_OF_YEAR%retention_in_days. But,
We're all volunteers here so we don't always have the time to fully
understand and plan others' schemas.
In general your questions seemed to be worried about a lot of things that
may or may not matter depending on the specifics of your implementation.
Without knowing those specifics it is hard
I'd suggest to think about manual major compactions and splits. Using
manual compactions and bulkload allows to split HFiles manually. Like if
you would like to read last 3 months more often that all others data, then
you could have three HFiles for each month and one HFile for whole other
stuff.