I ended up slogging through this, and got it working.  Here is the code for the 
custom writable and the corresponding custom SerDe, in case it helps others 
trying to do the same thing: http://pastebin.com/xUy36Kxg .

It dropped the average bytes/record from 30.5 (with a CSV text string) to 18.2. 
 Snappy block compression was enabled in both cases.  So, this was worth the 
effort.

It would be great if Hive could just read the Writable and treat it as the 
SerDe, at least for these simple cases, or at least have a tool that given the 
Writable, generates the corresponding SerDe.

-- Brad

On May 21, 2012, at 2:41 PM, Rubin, Bradley S. wrote:

My Hadoop MR job emits a value with three primitives via a custom Writable (see 
below).  How do I write a corresponding custom SerDe so that Hive can read the 
output from HDFS?  I can find complex SerDe examples (RegEx, JSON), but I can't 
find something simple to model from.

I think that my create table should look like this, correct?

CREATE EXTERNAL TABLE rats (time INT, frequency SMALLINT, convolution FLOAT)
ROW FORMAT SERDE 'neurohadoop.RatSerde'
STORED AS SEQUENCEFILE LOCATION '/neuro/output/rats';

----------

public class RatWritable implements Writable {
 int timestamp;
 short frequency;
 float convolution;

 public void readFields(DataInput in) throws IOException {
 timestamp = in.readInt();
 frequency = in.readShort();
 convolution = in.readFloat();
 }

 public void write(DataOutput out) throws IOException {
 out.writeInt(timestamp);
 out.writeShort(frequency);
 out.writeFloat(convolution);
 }
}

-- Brad

Reply via email to