I claim no experience in storing blobs in Hive, but it sounds to me that using 
array/list will be quite inefficient, in terms of both size and run time.


-----Original Message-----
From: Luke Forehand [mailto:luke.foreh...@networkedinsights.com] 
Sent: Tuesday, May 24, 2011 7:31 AM
To: user@hive.apache.org
Subject: Re: hive storing a byte array

Steven,

Thanks for your reply!  I have written it the way you mentioned, based on
an earlier post in this mailing list.  I'm concerned about having to
encode/decode the string in base64, I'm wondering how much this will
impact my job run time.

I have also written a UDF that emits a byte array, stored in a field of
type array<tinyint>.  When reading this field, the ObjectInspector is a
ListObjectInspector with primitiveJavaByte for the list elements.  Reading
this field in the UDF seems clunky because I have to iterate over the
list, reading each byte into a byte array, before I can use it.

Given both approaches, which one do you think has the least performance
overhead?

Thanks,
Luke



On 5/23/11 6:59 PM, "Steven Wong" <sw...@netflix.com> wrote:

>Hive does not support the blob data type. An option is to store your
>binary data encoded as string (such as using base64) and define them in
>Hive as string.
>
>
>-----Original Message-----
>From: Luke Forehand [mailto:luke.foreh...@networkedinsights.com]
>Sent: Monday, May 23, 2011 1:21 PM
>To: user@hive.apache.org
>Subject: hive storing a byte array
>
>Hello,
>
>Can someone please provide an example in Hive, how I can store a
>serialized object in a field?  A field type of byte array or binary or
>blob is really what I was looking for, but if something slightly less
>trivial is involved some instruction would be much appreciated.  This
>object is used in a custom UDF later on in the processing pipeline.
>
>-Luke
>
>


Reply via email to