Re: compressing values returned to scanner

Marc Parisi Mon, 01 Oct 2012 13:44:39 -0700

I'm sorry, I was't clear. Blame my sickness. When I typed block compression
I was referring to the blocks within the BCFile ( block compressed ), not
gz. But the point still remains. You couldn't return the stream through
thrift ( you could return the whole block ), so you would need to
decompress the keys and values. You could delay decompression of the value,
but you need to decompress to find the size of the value after the relative
key, whereas double compression would get you what you want.


hope that's clear.

On Mon, Oct 1, 2012 at 4:26 PM, Marc Parisi <[email protected]> wrote:

> Ameet, keys and values ( relative keys ) are extracted from a decompressor
> stream. In the case of block compression (i.e. gz ), you would need to
> return a block so the receiver can decompress it. Therefore, using existing
> compression, as Slacum mentioned, then decompressing the value is likely
> the best method.
>
>
> On Mon, Oct 1, 2012 at 4:00 PM, William Slacum <
> [email protected]> wrote:
>
>> Someone can correct me if I'm wrong, but I believe the file compression
>> option you quoted is for the RFiles in HDFS. You can enable compression
>> there and will still see some benefit even if you compress the values on
>> ingest.
>>
>>
>> On Mon, Oct 1, 2012 at 12:40 PM, ameet kini <[email protected]> wrote:
>>
>>> That is exactly my use case (ingest once, serve often, no server-side
>>> iterators).
>>>
>>> And I'm doing pre-compression on ingest. I was just looking to do away
>>> with app-level compression code. Not a biggie.
>>>
>>> Ameet
>>>
>>>
>>> On Mon, Oct 1, 2012 at 3:32 PM, William Slacum <
>>> [email protected]> wrote:
>>>
>>>> If you aren't often looking at the data in the value on the tablet
>>>> server (like in an iterator), you can also pre-compress your values on
>>>> ingest.
>>>>
>>>>
>>>> On Mon, Oct 1, 2012 at 12:19 PM, Marc Parisi <[email protected]> wrote:
>>>>
>>>>> You could compress the data in the value, and decompress the data upon
>>>>> receipt by the scanner.
>>>>>
>>>>>
>>>>> On Mon, Oct 1, 2012 at 3:03 PM, ameet kini <[email protected]>wrote:
>>>>>
>>>>>>
>>>>>> My understanding of compression in Accumulo 1.4.1 is that it is on by
>>>>>> default and that data is decompressed by the tablet server, so data on 
>>>>>> the
>>>>>> wire between server/client is decompressed. Is there a way to shift the
>>>>>> decompression from happening on the server to the client? I have a use 
>>>>>> case
>>>>>> where each Value in my table is relatively large (~ 8MB) and I can 
>>>>>> benefit
>>>>>> from compression over the wire. I don't have any server side iterators, 
>>>>>> so
>>>>>> the values don't need to be decompressed by the tablet server. Also, each
>>>>>> scan returns a few rows, so client-side decompression can be fast.
>>>>>>
>>>>>> The only way I can think of now is to disable compression on that
>>>>>> table, and handle compression/decompression in the application. But if
>>>>>> there is a way to do this in Accumulo, I'd prefer that.
>>>>>>
>>>>>> Thanks,
>>>>>> Ameet
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: compressing values returned to scanner

Reply via email to