Given, for r,cf,cq,cv: a,b,c,d a,b,c,e a,b,q,f a,b,x,d
Relative key encoding in RFile will result in the following symbolic encoding: a,b,c,d ,,,e ,,q,f ,,x,d This is not the optimal encoding, but it is fast and works well in practice, especially in the tables that accumulo supports well: those with millions of columns in a row. To answer your question, if you use only one cf, it will only be encoded once per block. -Eric On Mon, Jun 25, 2012 at 7:53 PM, Cardon, Tejay E <[email protected]> wrote: > Thanks Eric. That helps. With regards to the repeating key piece, does > this only happen for successive cells? In other words, if I use the same cf > in every row of a table, does that cf get repeated each time, or does this > cf repetition work across rows. I hope that makes sense. > > > > Thanks, > > > > Tejay > > > > From: Eric Newton [mailto:[email protected]] > Sent: Monday, June 25, 2012 4:46 PM > To: [email protected] > Subject: EXTERNAL: Re: RFile details > > > > Here's my high-level understanding. Let me know which aspect you would like > to know more about. > > > > RFile is built on top of BCFile, so you would need to dig up documentation > on that. Most of the compression is performed at that layer. > > > > However, RFile uses a few bits of each key/value to encode any repeating > row, cf, cq, cv information. This is helpful when a file contains just one > row, or when most of the data has the same visibility. > > > > BTW, "R" in RFile, stands for "Relative Key." > > > > Column families are grouped together into locality groups, and those > families falling outside of any defined family group go in the "default" > locality group. Column family -> locality group mappings are written to > metadata at the end of the RFile. Locality groups are stored in successive > sections of a file. Input is re-scanned multiple times during compactions > to produce locality groups that match a tables family->group mapping at the > time of the compaction. > > > > In 1.3, index information is stored in one large block at the end of the > file. In 1.4, the index blocks are hierarchical, to support incremental > loading of the index. > > > > -Eric > > > > On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E <[email protected]> > wrote: > > All, > > Can anyone point me to a design paper or other source of > some detail on how RFiles work? I’m curious about the compression under the > covers as well as the layout on disk of column families, etc. > > > > Thanks, > > Tejay Cardon > >
