Re: interesting

2013-05-20 Thread Eric Newton
am == row >>>>>>> year == column family >>>>>>> count == column qualifier (prepended with zeros for sort) >>>>>>> book count == value >>>>>>> >>>>>>> I used ascii text fo

Re: interesting

2013-05-19 Thread Jim Klucar
count == column qualifier (prepended with zeros for sort) >>>>>> book count == value >>>>>> >>>>>> I used ascii text for the counts, even. >>>>>> >>>>>> I'm not sure if the google entrie

Re: interesting

2013-05-15 Thread Josh Elser
Perhaps google used less aggressive settings for their compression. I'm more interested in 2-grams to test our partial-row compression in 1.5. -Eric On Fri, May 3, 2013 at 4:09 PM, Jared Winick mailto:jaredwin...@gmail.com>> wrote: That is very i

Re: interesting

2013-05-15 Thread Christopher
at does not repeat identical data from key to key, so >>>> in most cases, the row is not repeated. That gives gzip other >>>> things to work on. >>>> >>>> I'll have to do more analysis to figure out why RFile did so well. >>

Re: interesting

2013-05-15 Thread Eric Newton
;> in most cases, the row is not repeated. That gives gzip other >>> things to work on. >>> >>> I'll have to do more analysis to figure out why RFile did so well. >>> Perhaps google used less aggressive settings for their compression. >>> &g

Re: interesting

2013-05-15 Thread Eric Newton
gt;> in most cases, the row is not repeated. That gives gzip other >> things to work on. >> >> I'll have to do more analysis to figure out why RFile did so well. >> Perhaps google used less aggressive settings for their compression. >> >>

Re: interesting

2013-05-15 Thread Josh Elser
e interested in 2-grams to test our partial-row compression in 1.5. -Eric On Fri, May 3, 2013 at 4:09 PM, Jared Winick mailto:jaredwin...@gmail.com>> wrote: That is very interesting and sounds like a fun friday project! Could you please elaborate on

Re: interesting

2013-05-15 Thread Josh Elser
>> wrote: That is very interesting and sounds like a fun friday project! Could you please elaborate on how you mapped the original format of ngram TAB year TAB match_count TAB volume_count NEWLINE into Accumulo key/values? Could you briefly explain what featu

Re: interesting

2013-05-15 Thread Eric Newton
or their compression. > > I'm more interested in 2-grams to test our partial-row compression in 1.5. > > -Eric > > > On Fri, May 3, 2013 at 4:09 PM, Jared Winick wrote: > >> That is very interesting and sounds like a fun friday project! Could you >> please ela

Re: interesting

2013-05-03 Thread Eric Newton
compression in 1.5. -Eric On Fri, May 3, 2013 at 4:09 PM, Jared Winick wrote: > That is very interesting and sounds like a fun friday project! Could you > please elaborate on how you mapped the original format of > > ngram TAB year TAB match_count TAB volume_count NEWLINE > >

Re: interesting

2013-05-03 Thread Jared Winick
That is very interesting and sounds like a fun friday project! Could you please elaborate on how you mapped the original format of ngram TAB year TAB match_count TAB volume_count NEWLINE into Accumulo key/values? Could you briefly explain what feature in Accumulo is responsible for this

interesting

2013-05-03 Thread Eric Newton
I think David Medinets suggested some publicly available data sources that could be used to compare the storage requirements of different key/value stores. Today I tried it out. I took the google 1-gram word lists and ingested them into accumulo. http://storage.googleapis.com/books/ngrams/books/