In a nutshell:
- Puts are collected in memory (in a sorted data structure)
- When the collected data reaches a certain size it is flushed to a new file 
(which is sorted)
- Gets do a merge sort between the various files that have been created
- to contain the number of files they are periodically compacted into fewer, 
larger files


So the data files (HFiles) are immutable once written, changes are batched in 
memory first.

-- Lars



________________________________
 From: "Pamecha, Abhishek" <apame...@x.com>
To: "user@hbase.apache.org" <user@hbase.apache.org> 
Sent: Tuesday, August 21, 2012 4:00 PM
Subject: HBase Put
 
Hi

I had a  question on Hbase Put call. In the scenario, where data is inserted 
without any order to column qualifiers, how does Hbase maintain sortedness wrt 
column qualifiers in its store files/blocks?

I checked the code base and I can see 
checks<https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java#L319>
 being  made for lexicographic insertions for Key value pairs.  But I cant seem 
to find out how the
key-offset is calculated in the first place?

Also, given HDFS is by nature, append only, how do randomly ordered keys make 
their way to sorted order. Is it only during minor/major compactions, that this 
sortedness gets applied and that there is a small window during which data is 
not sorted?


Thanks,
Abhishek

Reply via email to