The primary problem is the namenode memory. It contains entries for every file and block, so setting hdfs block size small limits your scaleability.
There is nothing inherently wrong with in file random read, Its just That the hdfs client was written for a single reader to read most of a file. This to achieve high performance you'd need to do tricks, such as pipelining sockets and socket pool reuse. Right now for random reads We open a new socket, read data then close it. On Oct 18, 2010 8:22 PM, "William Kang" <weliam.cl...@gmail.com> wrote: > Hi JG and Ryan, > Thanks for the excellent answers. > > So, I am going to push everything to the extremes without considering > the memory first. In theory, if in HBase, every cell size equals to > HBase block size, then there would not be any in block traverse. In > HDFS, very HBase block size equals to each HDFS block size, there > would not be any in-file random access necessary. This would provide > the best performance? > > But, the problem is that if the block in HBase is too large, the > memory will run out since HBase load block into memory; if the block > in HDFS is too small, the DN will run out of memory since each HDFS > file takes some memory. So, it is a trade-off problem between memory > and performance. Is it right? > > And would it make any difference between random reading the same size > file portion from of a small HDFS block and from a large HDFS block? > > Thanks. > > > William > > On Mon, Oct 18, 2010 at 10:58 PM, Ryan Rawson <ryano...@gmail.com> wrote: >> On Mon, Oct 18, 2010 at 7:49 PM, William Kang <weliam.cl...@gmail.com> wrote: >>> Hi, >>> Recently I have spent some efforts to try to understand the mechanisms >>> of HBase to exploit possible performance tunning options. And many >>> thanks to the folks who helped with my questions in this community, I >>> have sent a report. But, there are still few questions left. >>> >>> 1. If a HFile block contains more than one keyvalue pair, will the >>> block index in HFile point out the offset for every keyvalue pair in >>> that block? Or, the block index will just point out the key ranges >>> inside that block, so you have to traverse inside the block until you >>> meet the key you are looking for? >> >> The block index contains the first key for every block. It therefore >> defines in an [a,b) manner the range of each block. Once a block has >> been selected to read from, it is read into memory then iterated over >> until the key in question has been found (or the closest match has >> been found). >> >>> 2. When HBase read block to fetching the data or traverse in it, is >>> this block read into memory? >> >> yes, the entire block at a time is read in a single read operation. >> >>> >>> 3. HBase blocks (64k configurable) are inside HDFS blocks (64m >>> configurable), to read the HBase blocks, we have to random access the >>> HDFS blocks. Even HBase can use in(p, buf, 0, x) to read a small >>> portion of the larger HDFS blocks, it is still a random access. Would >>> this be slow? >> >> Random access reads are not necessarily slow, they require several things: >> - disk seeks to the data in question >> - disk seeks to the checksum files in question >> - checksum computation and verification >> >> While not particularly slow, this could probably be optimized a bit. >> >> Most of the issues with random reads in HDFS is parallelizing the >> reads and doing as much io-pushdown/scheduling as possible without >> consuming an excess of sockets and threads. The actual speed can be >> excellent, or not, depending on how busy the IO subsystem is. >> >> >>> >>> Many thanks. I would be grateful for your answers. >>> >>> >>> William >>> >>