Re: Hbase performance with HDFS

Mohit Anchlia Thu, 07 Jul 2011 15:03:05 -0700

Thanks! I understand what you mean however I have little confusion.
Does it mean there are unused block sitting around? For eg:


HFile1 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node
B:b1,(b2),b3 and Node C:b1,b2,(b3).

HFile2 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node
B:b1,(b2),b3 and Node C:b1,b2,(b3)

I have 2 questions:

1) When compactions occur on Node A would it also include b2 and b3
which is actually a redundant copy? My guess is yes.
2) Now compaction occurs and creates HFile3 which as you said is
replicated. But what happens to HFile1 and HFile2? I am assuming it
gets deleted.

Thanks for everyones patience!

On Thu, Jul 7, 2011 at 2:43 PM, Buttler, David <buttl...@llnl.gov> wrote:
> The nice part of using HDFS as the file system is that the replication is 
> taken care of by the file system.  So, when the compaction finishes, that 
> means the replication has already taken place.
>
> -----Original Message-----
> From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
> Sent: Thursday, July 07, 2011 2:02 PM
> To: user@hbase.apache.org; Andrew Purtell
> Subject: Re: Hbase performance with HDFS
>
> Thanks Andrew. Really helpful. I think I have one more question right
> now :) Underneath HDFS replicates blocks by default 3. Not sure how it
> relates to HFile and compactions. When compaction occurs is it also
> happening on the replica blocks from other nodes? If not then how does
> it work when one node fails.
>
> On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <apurt...@apache.org> wrote:
>>> You mentioned about compactions, when do those occur and what triggers
>>> them?
>>
>> Compactions are triggered by an algorithm that monitors the number of flush 
>> files in a store and the size of them, and is configurable in several 
>> dimensions.
>>
>>> Does it cause additional space usage when that happens
>>
>> Yes.
>>
>>> if it
>>> does it would mean you always need to have much more disk then you
>>> really need.
>>
>>
>> Not all regions are compacted at once. Each region by default is constrained 
>> to 256 MB. Not all regions will hold the full amount of data. The result is 
>> not a perfect copy (doubling) if some data has been deleted or are 
>> associated with TTLs that have expired. The merge sorted result is moved 
>> into place and the old files are deleted as soon as the compaction 
>> completes. So how much more is "much more"? You can't write to any kind of 
>> data store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, 
>> or...
>>
>>> Since HDFS is mostly write once how are updates/deletes handled?
>>
>>
>> Not mostly, only write once.
>>
>> From the BigTable paper, section 5.3: "A valid read operation is executed on 
>> a merged view of the sequence of SSTables and the memtable. Since the 
>> SSTables and the memtable are lexicographically sorted data structures, the 
>> merged view can be formed efficiently." So what this means is all the store 
>> files and the memstore serve effectively as change logs sorted in reverse 
>> chronological order.
>>
>> Deletes are just another write, but one that writes tombstones "covering" 
>> data with older timestamps.
>>
>> When serving queries, HBase searches store files back in time until it finds 
>> data at the coordinates requested or a tombstone.
>>
>> The process of compaction not only merge sorts a bunch of accumulated store 
>> files (from flushes) into fewer store files (or one) for read efficiency, it 
>> also performs housekeeping, dropping data "covered" by the delete 
>> tombstones. Incidentally this is also how TTLs are supported: expired values 
>> are dropped as well.
>>
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein 
>> (via Tom White)
>>
>>
>>>________________________________
>>>From: Mohit Anchlia <mohitanch...@gmail.com>
>>>To: Andrew Purtell <apurt...@apache.org>
>>>Cc: "user@hbase.apache.org" <user@hbase.apache.org>
>>>Sent: Thursday, July 7, 2011 12:30 PM
>>>Subject: Re: Hbase performance with HDFS
>>>
>>>Thanks that helps! Just few more questions:
>>>
>>>You mentioned about compactions, when do those occur and what triggers
>>>them? Does it cause additional space usage when that happens, if it
>>>does it would mean you always need to have much more disk then you
>>>really need.
>>>
>>>Since HDFS is mostly write once how are updates/deletes handled?
>>>
>>>Is Hbase also suitable for Blobs?
>>>
>>>On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell <apurt...@apache.org> wrote:
>>>> Some thoughts off the top of my head. Lars' architecture material
>>>> might/should cover this too. Pretty sure his book will.
>>>> Regarding reads:
>>>> One does not have to read a whole HDFS block. You can request arbitrary 
>>>> byte
>>>> ranges with the block, via positioned reads. (It is true also that HDFS can
>>>> be improved for better random reading performance in ways not necessarily
>>>> yet committed to trunk or especially a 0.20.x branch with append support 
>>>> for
>>>> HBase. See https://issues.apache.org/jira/browse/HDFS-1323)
>>>> HBase holds indexes to store files in HDFS in memory. We also open all 
>>>> store
>>>> files at the HDFS layer and stash those references. Additionally, users can
>>>> specify the use of bloom filters to improve query time performance through
>>>> wholesale skipping of HFile reads if they are known not to contain data 
>>>> that
>>>> satisfies the query. Bloom filters are held in memory as well.
>>>> So with indexes resident in memory when handling Gets we know the byte
>>>> ranges within HDFS block(s) that contain the data of interest. With
>>>> positioned reads we retrieve only those bytes from a DataNode. With 
>>>> optional
>>>> bloomfilters we avoid whole HFiles entirely.
>>>> Regarding writes:
>>>> I think you should consult the bigtable paper again if you are still asking
>>>> about the write path. The database is log structured. Writes are 
>>>> accumulated
>>>> in memory, and flushed all at once. Later flush files are compacted as
>>>> needed, because as you point out GFS and HDFS are optimized for streaming
>>>> sequential reads and writes.
>>>>
>>>> Best regards,
>>>>
>>>>   - Andy
>>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>>> (via Tom White)
>>>>
>>>> ________________________________
>>>> From: Mohit Anchlia <mohitanch...@gmail.com>
>>>> To: user@hbase.apache.org; Andrew Purtell <apurt...@apache.org>
>>>> Sent: Thursday, July 7, 2011 11:53 AM
>>>> Subject: Re: Hbase performance with HDFS
>>>>
>>>> I have looked at bigtable and it's ssTables etc. But my question is
>>>> directly related to how it's used with HDFS. HDFS recommends large
>>>> files, bigger blocks, write once and read many sequential reads. But
>>>> accessing small rows and writing small rows is more random and
>>>> different than inherent design of HDFS. How do these 2 go together and
>>>> is able to provide performance.
>>>>
>>>> On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <apurt...@apache.org> 
>>>> wrote:
>>>>> Hi Mohit,
>>>>>
>>>>> Start here: http://labs.google.com/papers/bigtable.html
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>>     - Andy
>>>>>
>>>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>>>> (via Tom White)
>>>>>
>>>>>
>>>>>>________________________________
>>>>>>From: Mohit Anchlia <mohitanch...@gmail.com>
>>>>>>To: user@hbase.apache.org
>>>>>>Sent: Thursday, July 7, 2011 11:12 AM
>>>>>>Subject: Hbase performance with HDFS
>>>>>>
>>>>>>I've been trying to understand how Hbase can provide good performance
>>>>>>using HDFS when purpose of HDFS is sequential large block sizes which
>>>>>>is inherently different than of Hbase where it's more random and row
>>>>>>sizes might be very small.
>>>>>>
>>>>>>I am reading this but doesn't answer my question. It does say that
>>>>>>HFile block size is different but how it really works with HDFS is
>>>>>>what I am trying to understand.
>>>>>>
>>>>>>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: Hbase performance with HDFS

Reply via email to