Re: Disk space usage of HFilev1 vs HFilev2

J Mohamed Zahoor Wed, 15 Aug 2012 04:58:21 -0700

Cool. Now we have something on the records :-)

./Zahoor@iPad


On 15-Aug-2012, at 3:12 AM, Harsh J <ha...@cloudera.com> wrote:

> Not wanting to have this thread too end up as a mystery-result on the
> web, I did some tests. I loaded 10k rows (of 100 KB random chars each)
> into test tables on 0.90 and 0.92 both, flushed them, major_compact'ed
> them (waited for completion and drop in IO write activity) and then
> measured them to find this:
> 
> 0.92 takes a total of 1049661190 bytes under its /hbase/test directory.
> 0.90 takes a total of 1049467570 bytes under its /hbase/test directory.
> 
> So… not much of a difference. It is still your data that counts. I
> believe what Anil may have had were merely additional, un-compacted
> stores?
> 
> P.s. Note that my 'test' table were all defaults. That is, merely
> "create 'test', 'col1'", nothing else, so the block indexes must've
> probably gotten created for every row, as thats at 64k by default,
> while my rows are all 100k each.
> 
> On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <anilgupt...@gmail.com> wrote:
>> Hi Kevin,
>> 
>> If it's not possible to store table in HFilev1 in HBase 0.92 then my last
>> option will be to do store data on pseudo-distributed or standalone cluster
>> for the comparison.
>> The advantage with the current installation is that its a fully distributed
>> cluster with around 33 million records in a table. So, it would give me a
>> better estimate.
>> 
>> Thanks,
>> Anil Gupta
>> 
>> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell 
>> <kevin.od...@cloudera.com>wrote:
>> 
>>> Do you not have a pseudo cluster for testing anywhere?
>>> 
>>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <anilgupt...@gmail.com> wrote:
>>> 
>>>> Hi Jerry,
>>>> 
>>>> I am wiling to do that but the problem is that i wiped off the HBase0.90
>>>> cluster. Is there a way to store a table in HFilev1 in HBase0.92? If i
>>> can
>>>> store a file in HFilev1 in 0.92 then i can do the comparison.
>>>> 
>>>> Thanks,
>>>> Anil Gupta
>>>> 
>>>> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>>> 
>>>>> Hi Anil:
>>>>> 
>>>>> Maybe you can try to compare the two HFile implementation directly? Let
>>>> say
>>>>> write 1000 rows into HFile v1 format and then into HFile v2 format. You
>>>> can
>>>>> then compare the size of the two directly?
>>>>> 
>>>>> HTH,
>>>>> 
>>>>> Jerry
>>>>> 
>>>>> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <anilgupt...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Hi Zahoor,
>>>>>> 
>>>>>> Then it seems like i might have missed something when doing hdfs
>>> usage
>>>>>> estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME
>>> for
>>>>>> getting the hdfs usage of a table. Is this the right way? Since i
>>> wiped
>>>>> of
>>>>>> the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is
>>> it
>>>>>> possible to store a table in HFileV1 instead of HFileV2 in HBase0.92?
>>>>>> In this way i can do a fair comparison.
>>>>>> 
>>>>>> Thanks,
>>>>>> Anil Gupta
>>>>>> 
>>>>>> On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jmo...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi Anil,
>>>>>>> 
>>>>>>> I really doubt that there is 50% drop in file sizes... As far as i
>>>>> know..
>>>>>>> there is no drastic space conserving feature in V2. Just as  an
>>> after
>>>>>>> thought.. do a major compact and check the sizes.
>>>>>>> 
>>>>>>> ./Zahoor
>>>>>>> http://blog.zahoor.in
>>>>>>> 
>>>>>>> 
>>>>>>> On 15-Aug-2012, at 12:31 AM, anil gupta <anilgupt...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> l
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>> Anil Gupta
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Thanks & Regards,
>>>> Anil Gupta
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Kevin O'Dell
>>> Customer Operations Engineer, Cloudera
>>> 
>> 
>> 
>> 
>> --
>> Thanks & Regards,
>> Anil Gupta
> 
> 
> 
> -- 
> Harsh J

Re: Disk space usage of HFilev1 vs HFilev2

Reply via email to