RE: HBase random access in HDFS and block indices

2010-11-02 Thread Michael Segel
Once you instantiate the HFile object, you should be able to as many random get() against the table until you close the reference. > Date: Tue, 2 Nov 2010 16:07:29 +0800 > Subject: Re: HBase random access in HDFS and block indices > From: xietao.mail...@gmail.com > To: user@hbas

Re: HBase random access in HDFS and block indices

2010-11-02 Thread Tao Xie
I read the code and my understanding is when a RS starts StoreFiles of each Region will be instantiated. Then HFile.reader.loadFileInfo() will read the the index and file info. So each StoreFile is opened only once and block index are cached. The cache miss are for blocks. I mean for random Get eac

RE: HBase random access in HDFS and block indices

2010-11-01 Thread Michael Segel
> Date: Fri, 29 Oct 2010 10:01:24 -0700 > Subject: Re: HBase random access in HDFS and block indices > From: st...@duboce.net > To: user@hbase.apache.org > > On Fri, Oct 29, 2010 at 6:41 AM, Sean Bigdatafun > wrote: > > I have the same doubt here. Let's

Re: HBase random access in HDFS and block indices

2010-10-29 Thread Stack
On Fri, Oct 29, 2010 at 6:41 AM, Sean Bigdatafun wrote: > I have the same doubt here. Let's say I have a totally random read pattern > (uniformly distributed). > > Now let's assume my total data size stored in HBase is 100TB on 10 > machines(not a big deal considering nowaday's disks), and the tot

Re: HBase random access in HDFS and block indices

2010-10-29 Thread Stack
On Mon, Oct 18, 2010 at 9:30 PM, Matt Corgan wrote: > I was envisioning the HFiles being opened and closed more often, but it > sounds like they're held open for long periods and that the indexes are > permanently cached.  Is it roughly correct to say that after opening an > HFile and loading its

Re: HBase random access in HDFS and block indices

2010-10-29 Thread Alvin C.L Huang
@Sean Consider an expected low hit ratio, cache will not benefit your random get. This would cause too many java major GCs that pause all thread and thus bad performance. Try no cache at all. -- Alvin C.-L., Huang ATC, ICL, ITRI, Taiwan On 29 October 2010 21:41, Sean Bigdatafun wrote: > I ha

Re: HBase random access in HDFS and block indices

2010-10-29 Thread Sean Bigdatafun
I have the same doubt here. Let's say I have a totally random read pattern (uniformly distributed). Now let's assume my total data size stored in HBase is 100TB on 10 machines(not a big deal considering nowaday's disks), and the total size of my RS' memory is 10 * 6G = 60 GB. That translate into a

Re: HBase random access in HDFS and block indices

2010-10-18 Thread Matt Corgan
I was envisioning the HFiles being opened and closed more often, but it sounds like they're held open for long periods and that the indexes are permanently cached. Is it roughly correct to say that after opening an HFile and loading its checksum/metadata/index/etc then each random data block acces

Re: HBase random access in HDFS and block indices

2010-10-18 Thread Ryan Rawson
Hi, Since the file is write-once, no random writes, putting the index at the end is the only choice. The loading goes like this: - read fixed file trailer, ie: filelen.offset - - read location of additional variable length sections, eg: block index - read those indexes, including the variable le

RE: HBase random access in HDFS and block indices

2010-10-18 Thread Jonathan Gray
ginal Message- > From: Matt Corgan [mailto:mcor...@hotpads.com] > Sent: Monday, October 18, 2010 8:53 PM > To: user > Subject: Re: HBase random access in HDFS and block indices > > Do you guys ever worry about how big an HFile's index will be? For > example, > if you have

Re: HBase random access in HDFS and block indices

2010-10-18 Thread Matt Corgan
Do you guys ever worry about how big an HFile's index will be? For example, if you have a 512mb HFile with 8k block size, you will have 64,000 blocks. If each index entry is 50b, then you have a 3.2mb index which is way out of line with your intention of having a small block size. I believe that

Re: HBase random access in HDFS and block indices

2010-10-18 Thread Ryan Rawson
The primary problem is the namenode memory. It contains entries for every file and block, so setting hdfs block size small limits your scaleability. There is nothing inherently wrong with in file random read, Its just That the hdfs client was written for a single reader to read most of a file. Thi

Re: HBase random access in HDFS and block indices

2010-10-18 Thread William Kang
Hi JG and Ryan, Thanks for the excellent answers. So, I am going to push everything to the extremes without considering the memory first. In theory, if in HBase, every cell size equals to HBase block size, then there would not be any in block traverse. In HDFS, very HBase block size equals to each

Re: HBase random access in HDFS and block indices

2010-10-18 Thread Ryan Rawson
On Mon, Oct 18, 2010 at 7:49 PM, William Kang wrote: > Hi, > Recently I have spent some efforts to try to understand the mechanisms > of HBase to exploit possible performance tunning options. And many > thanks to the folks who helped with my questions in this community, I > have sent a report. But

RE: HBase random access in HDFS and block indices

2010-10-18 Thread Jonathan Gray
Hi William. Answers inline. > -Original Message- > From: William Kang [mailto:weliam.cl...@gmail.com] > Sent: Monday, October 18, 2010 7:48 PM > To: hbase-user > Subject: HBase random access in HDFS and block indices > > Hi, > Recently I have spent some efforts

Fwd: HBase random access in HDFS and block indices

2010-10-18 Thread William Kang
Hi, Recently I have spent some efforts to try to understand the mechanisms of HBase to exploit possible performance tunning options. And many thanks to the folks who helped with my questions in this community, I have sent a report. But, there are still few questions left. 1. If a HFile block conta

HBase random access in HDFS and block indices

2010-10-18 Thread William Kang
Hi, Recently I have spent some efforts to try to understand the mechanisms of HBase to exploit possible performance tunning options. And many thanks to the folks who helped with my questions in this community, I have sent a report. But, there are still few questions left. 1. If a HFile block conta