That is correct.
________________________________ From: "Pamecha, Abhishek" <apame...@x.com> To: "user@hbase.apache.org" <user@hbase.apache.org>; lars hofhansl <lhofha...@yahoo.com> Sent: Tuesday, August 21, 2012 4:45 PM Subject: RE: HBase Put Hi Lars, Thanks for the explanation. I still have a little doubt: Based on your description, given gets do a merge sort, the data on disk is not kept sorted across files, but just sorted within a file. So, basically if on two separate days, say these keys get inserted: Day1: File1: A B J M Day2: File2: C D K P Then each file is sorted within itself, but scanning both files will require Hbase to use merge sort to produce a sorted result. Right? Also, File 1 and File2 are immutable, and during compactions, File 1 and File2 are compacted and sorted using merge sort to a bigger File3. Is that correct too? Thanks, Abhishek -----Original Message----- From: lars hofhansl [mailto:lhofha...@yahoo.com] Sent: Tuesday, August 21, 2012 4:07 PM To: user@hbase.apache.org Subject: Re: HBase Put In a nutshell: - Puts are collected in memory (in a sorted data structure) - When the collected data reaches a certain size it is flushed to a new file (which is sorted) - Gets do a merge sort between the various files that have been created - to contain the number of files they are periodically compacted into fewer, larger files So the data files (HFiles) are immutable once written, changes are batched in memory first. -- Lars ________________________________ From: "Pamecha, Abhishek" <apame...@x.com> To: "user@hbase.apache.org" <user@hbase.apache.org> Sent: Tuesday, August 21, 2012 4:00 PM Subject: HBase Put Hi I had a question on Hbase Put call. In the scenario, where data is inserted without any order to column qualifiers, how does Hbase maintain sortedness wrt column qualifiers in its store files/blocks? I checked the code base and I can see checks<https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java#L319> being made for lexicographic insertions for Key value pairs. But I cant seem to find out how the key-offset is calculated in the first place? Also, given HDFS is by nature, append only, how do randomly ordered keys make their way to sorted order. Is it only during minor/major compactions, that this sortedness gets applied and that there is a small window during which data is not sorted? Thanks, Abhishek