am == row
>>>>>>> year == column family
>>>>>>> count == column qualifier (prepended with zeros for sort)
>>>>>>> book count == value
>>>>>>>
>>>>>>> I used ascii text fo
count == column qualifier (prepended with zeros for sort)
>>>>>> book count == value
>>>>>>
>>>>>> I used ascii text for the counts, even.
>>>>>>
>>>>>> I'm not sure if the google entrie
Perhaps google used less aggressive settings for their
compression.
I'm more interested in 2-grams to test our partial-row compression
in 1.5.
-Eric
On Fri, May 3, 2013 at 4:09 PM, Jared Winick mailto:jaredwin...@gmail.com>> wrote:
That is very i
at does not repeat identical data from key to key, so
>>>> in most cases, the row is not repeated. That gives gzip other
>>>> things to work on.
>>>>
>>>> I'll have to do more analysis to figure out why RFile did so well.
>>
;> in most cases, the row is not repeated. That gives gzip other
>>> things to work on.
>>>
>>> I'll have to do more analysis to figure out why RFile did so well.
>>> Perhaps google used less aggressive settings for their compression.
>>>
&g
gt;> in most cases, the row is not repeated. That gives gzip other
>> things to work on.
>>
>> I'll have to do more analysis to figure out why RFile did so well.
>> Perhaps google used less aggressive settings for their compression.
>>
>>
e interested in 2-grams to test our partial-row compression
in 1.5.
-Eric
On Fri, May 3, 2013 at 4:09 PM, Jared Winick mailto:jaredwin...@gmail.com>> wrote:
That is very interesting and sounds like a fun friday project!
Could you please elaborate on
>> wrote:
That is very interesting and sounds like a fun friday project!
Could you please elaborate on how you mapped the original format of
ngram TAB year TAB match_count TAB volume_count NEWLINE
into Accumulo key/values? Could you briefly explain what featu
or their compression.
>
> I'm more interested in 2-grams to test our partial-row compression in 1.5.
>
> -Eric
>
>
> On Fri, May 3, 2013 at 4:09 PM, Jared Winick wrote:
>
>> That is very interesting and sounds like a fun friday project! Could you
>> please ela
compression in 1.5.
-Eric
On Fri, May 3, 2013 at 4:09 PM, Jared Winick wrote:
> That is very interesting and sounds like a fun friday project! Could you
> please elaborate on how you mapped the original format of
>
> ngram TAB year TAB match_count TAB volume_count NEWLINE
>
>
That is very interesting and sounds like a fun friday project! Could you
please elaborate on how you mapped the original format of
ngram TAB year TAB match_count TAB volume_count NEWLINE
into Accumulo key/values? Could you briefly explain what feature in
Accumulo is responsible for this
I think David Medinets suggested some publicly available data sources that
could be used to compare the storage requirements of different key/value
stores.
Today I tried it out.
I took the google 1-gram word lists and ingested them into accumulo.
http://storage.googleapis.com/books/ngrams/books/
12 matches
Mail list logo