Hi
I have a few questions on blocks/file and file/region.

1.       Can there be multiple row keys per block and then per  HFile? Or is a 
block or Hfile dedicated to a single row key?



I have a scenario, where for the same column family, some rowkeys will have 
very wide rows, say rowkey W, and some rowkeys will have very narrow rows, say 
rowkey N. In my case,  puts for rowkeys W and N are interleaved with a ratio of 
say 90 rowkeyW puts vs 10 rowkeyN puts. On the get side, my app works on 
getting data for a single  rowkey at a time.



Will that mean for a rowkeyN, the entries will be scattered across regions on 
that same region server, given there are interleaved puts? Or Is there a way I 
can enforce contiguous  writes to a region/Hfile reserved for rowkey N.  This 
way, I can leverage the block cache and have the entire/most of  rowkeyN fit in 
there for that session.



2.       Is there a limit on number of HFiles that can exist per region? 
Basically, on what criteria does a rowkey data gets split in two regions [on 
the same region server]. I am assuming there can be many regions per region 
server. And multiple regions for the same table can belong in the same region 
server.


3.       Also, is there a limit on the number of blocks that are created per 
HFile? What determines whether a split is required?



Thanks,
Abhishek

Reply via email to