RE: Hadoop Serialization mechanisms

Peter Marron Mon, 31 Mar 2014 02:48:27 -0700

Hi,

Certainly not an expert on this but if you ask twice...


2. My understanding is that Writables are the only way for Map/Reduce jobs.

4. I don't think that switching from Writables would benefit Map/Reduce.
The point is that in the context of a Map/Reduce
job you know the version of the serialization and the actual classes involved.
So the serialization contains just the data required to instantiate the objects
and nothing else. Serialization formats like Avro and protobufs etc
have mechanisms to handle different versions of the classes and typing
of the fields etc. They are, to a certain extent, self-describing.
This means that they must include an overhead to tag fields and classes
etc. and so, of necessity, they will be larger than the Writable classes.
(Here I'm talking about the raw objects before compression.)
I would have thought that this means that the Writable classes
will always be the correct choice for Map/Reduce.

As pointed out by others, if you write your own custom Writable
classes then you can exploit your domain-specific knowledge of
the data to provide even more efficient and compact representations.

Regards,

Peter Marron
Senior Developer
Trillium Software, A Harte Hanks Company
Theale Court, 1st Floor, 11-13 High Street
Theale
RG7 5AH
+44 (0) 118 940 7609 office
+44 (0) 118 940 7699 fax
[https://4b2685446389bc779b46-5f66fbb59518cc4fcae8900db28267f5.ssl.cf2.rackcdn.com/trillium.png]<http://www.trilliumsoftware.com/>
trilliumsoftware.com<http://www.trilliumsoftware.com/> / 
linkedin<http://www.linkedin.com/company/17710> / 
twitter<https://twitter.com/trilliumsw> / 
facebook<http://www.facebook.com/HarteHanks>

From: Radhe Radhe [mailto:radhe.krishna.ra...@live.com]
Sent: 30 March 2014 14:27
To: user@hadoop.apache.org
Subject: RE: Hadoop Serialization mechanisms

Second try.

Please find some time to answer some of my queries.

Thanks,
-RR
________________________________
From: radhe.krishna.ra...@live.com<mailto:radhe.krishna.ra...@live.com>
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Hadoop Serialization mechanisms
Date: Thu, 27 Mar 2014 13:29:35 +0530
Hello All,

AFAIK Hadoop serialization comes into picture in the 2 areas:

  1.  putting data on the wire i.e., for interprocess communication between 
nodes using RPC
  2.  putting data on disk i.e. using the Map Reduce for persistent storage say 
HDFS.

I have a couple of questions regarding the Serialization mechanisms used in 
Hadoop:


  1.  Does Hadoop provides a pluggible feature for Serialization for both the 
above cases?
  2.  Is Writable the default Serialization mechanism for both the above cases?
  3.  Were there any changes w.r.t. to Serialization from Hadoop 1.x to Hadoop 
2.x?
  4.  Will there be a significant performance gain if the default Serialization 
i.e. Writables is replaced with Avro, Protol Buffers or Thrift in Map Reduce 
programming?

Thanks,
-RR

<<inline: image002.png>>

RE: Hadoop Serialization mechanisms

Reply via email to