Thanks Eric. That helps. With regards to the repeating key piece, does this only happen for successive cells? In other words, if I use the same cf in every row of a table, does that cf get repeated each time, or does this cf repetition work across rows. I hope that makes sense.
Thanks, Tejay From: Eric Newton [mailto:[email protected]] Sent: Monday, June 25, 2012 4:46 PM To: [email protected] Subject: EXTERNAL: Re: RFile details Here's my high-level understanding. Let me know which aspect you would like to know more about. RFile is built on top of BCFile, so you would need to dig up documentation on that. Most of the compression is performed at that layer. However, RFile uses a few bits of each key/value to encode any repeating row, cf, cq, cv information. This is helpful when a file contains just one row, or when most of the data has the same visibility. BTW, "R" in RFile, stands for "Relative Key." Column families are grouped together into locality groups, and those families falling outside of any defined family group go in the "default" locality group. Column family -> locality group mappings are written to metadata at the end of the RFile. Locality groups are stored in successive sections of a file. Input is re-scanned multiple times during compactions to produce locality groups that match a tables family->group mapping at the time of the compaction. In 1.3, index information is stored in one large block at the end of the file. In 1.4, the index blocks are hierarchical, to support incremental loading of the index. -Eric On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E <[email protected]<mailto:[email protected]>> wrote: All, Can anyone point me to a design paper or other source of some detail on how RFiles work? I'm curious about the compression under the covers as well as the layout on disk of column families, etc. Thanks, Tejay Cardon
