Thanks Eric and John. That makes sense, and clears up some things regarding design. I really appreciate both of you taking time to respond.
Tejay -----Original Message----- From: Eric Newton [mailto:[email protected]] Sent: Monday, June 25, 2012 7:32 PM To: [email protected] Subject: Re: EXTERNAL: Re: RFile details Given, for r,cf,cq,cv: a,b,c,d a,b,c,e a,b,q,f a,b,x,d Relative key encoding in RFile will result in the following symbolic encoding: a,b,c,d ,,,e ,,q,f ,,x,d This is not the optimal encoding, but it is fast and works well in practice, especially in the tables that accumulo supports well: those with millions of columns in a row. To answer your question, if you use only one cf, it will only be encoded once per block. -Eric On Mon, Jun 25, 2012 at 7:53 PM, Cardon, Tejay E <[email protected]> wrote: > Thanks Eric. That helps. With regards to the repeating key piece, > does this only happen for successive cells? In other words, if I use > the same cf in every row of a table, does that cf get repeated each > time, or does this cf repetition work across rows. I hope that makes sense. > > > > Thanks, > > > > Tejay > > > > From: Eric Newton [mailto:[email protected]] > Sent: Monday, June 25, 2012 4:46 PM > To: [email protected] > Subject: EXTERNAL: Re: RFile details > > > > Here's my high-level understanding. Let me know which aspect you > would like to know more about. > > > > RFile is built on top of BCFile, so you would need to dig up > documentation on that. Most of the compression is performed at that layer. > > > > However, RFile uses a few bits of each key/value to encode any > repeating row, cf, cq, cv information. This is helpful when a file > contains just one row, or when most of the data has the same visibility. > > > > BTW, "R" in RFile, stands for "Relative Key." > > > > Column families are grouped together into locality groups, and those > families falling outside of any defined family group go in the "default" > locality group. Column family -> locality group mappings are written > to metadata at the end of the RFile. Locality groups are stored in > successive sections of a file. Input is re-scanned multiple times > during compactions to produce locality groups that match a tables > family->group mapping at the time of the compaction. > > > > In 1.3, index information is stored in one large block at the end of > the file. In 1.4, the index blocks are hierarchical, to support > incremental loading of the index. > > > > -Eric > > > > On Mon, Jun 25, 2012 at 1:11 PM, Cardon, Tejay E > <[email protected]> > wrote: > > All, > > Can anyone point me to a design paper or other source > of some detail on how RFiles work? I'm curious about the compression > under the covers as well as the layout on disk of column families, etc. > > > > Thanks, > > Tejay Cardon > >
