Hi, I wanted to minimize on the number of map reduce tasks generated while processing a job, hence configured it to a larger value.
I don't think i have configured HFile size in the cluster. I use Cloudera Manager to mange my cluster, and the only configuration i can relate to is hfile.block.cache.size which is set to 0.25. How do i change the HFile size ? On 13 May 2013 15:03, Amandeep Khurana <ama...@gmail.com> wrote: > On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani <praveen.ii...@gmail.com > >wrote: > > > Hi, > > > > I have the dfs.block.size value set to 1 GB in my cluster configuration. > > > Just out of curiosity - why do you have it set at 1GB? > > > > I > > have around 250 GB of data stored in hbase over this cluster. But when i > > check the number of blocks, it doesn't correspond to the block size > value i > > set. From what i understand i should only have ~250 blocks. But instead > > when i did a fsck on the /hbase/<table-name>, i got the following > > > > Status: HEALTHY > > Total size: 265727504820 B > > Total dirs: 1682 > > Total files: 1459 > > Total blocks (validated): 1459 (avg. block size 182129886 B) > > Minimally replicated blocks: 1459 (100.0 %) > > Over-replicated blocks: 0 (0.0 %) > > Under-replicated blocks: 0 (0.0 %) > > Mis-replicated blocks: 0 (0.0 %) > > Default replication factor: 3 > > Average block replication: 3.0 > > Corrupt blocks: 0 > > Missing replicas: 0 (0.0 %) > > Number of data-nodes: 5 > > Number of racks: 1 > > > > Are there any other configuration parameters that need to be set ? > > > What is your HFile size set to? The HFiles that get persisted would be > bound by that number. Thereafter each HFile would be split into blocks, the > size of which you configure using the dfs.block.size configuration > parameter. > > > > > > -- > > Regards, > > Praveen Bysani > > http://www.praveenbysani.com > > > -- Regards, Praveen Bysani http://www.praveenbysani.com