Hi, Certainly not an expert on this but if you ask twice...
2. My understanding is that Writables are the only way for Map/Reduce jobs. 4. I don't think that switching from Writables would benefit Map/Reduce. The point is that in the context of a Map/Reduce job you know the version of the serialization and the actual classes involved. So the serialization contains just the data required to instantiate the objects and nothing else. Serialization formats like Avro and protobufs etc have mechanisms to handle different versions of the classes and typing of the fields etc. They are, to a certain extent, self-describing. This means that they must include an overhead to tag fields and classes etc. and so, of necessity, they will be larger than the Writable classes. (Here I'm talking about the raw objects before compression.) I would have thought that this means that the Writable classes will always be the correct choice for Map/Reduce. As pointed out by others, if you write your own custom Writable classes then you can exploit your domain-specific knowledge of the data to provide even more efficient and compact representations. Regards, Peter Marron Senior Developer Trillium Software, A Harte Hanks Company Theale Court, 1st Floor, 11-13 High Street Theale RG7 5AH +44 (0) 118 940 7609 office +44 (0) 118 940 7699 fax [https://4b2685446389bc779b46-5f66fbb59518cc4fcae8900db28267f5.ssl.cf2.rackcdn.com/trillium.png]<http://www.trilliumsoftware.com/> trilliumsoftware.com<http://www.trilliumsoftware.com/> / linkedin<http://www.linkedin.com/company/17710> / twitter<https://twitter.com/trilliumsw> / facebook<http://www.facebook.com/HarteHanks> From: Radhe Radhe [mailto:radhe.krishna.ra...@live.com] Sent: 30 March 2014 14:27 To: user@hadoop.apache.org Subject: RE: Hadoop Serialization mechanisms Second try. Please find some time to answer some of my queries. Thanks, -RR ________________________________ From: radhe.krishna.ra...@live.com<mailto:radhe.krishna.ra...@live.com> To: user@hadoop.apache.org<mailto:user@hadoop.apache.org> Subject: Hadoop Serialization mechanisms Date: Thu, 27 Mar 2014 13:29:35 +0530 Hello All, AFAIK Hadoop serialization comes into picture in the 2 areas: 1. putting data on the wire i.e., for interprocess communication between nodes using RPC 2. putting data on disk i.e. using the Map Reduce for persistent storage say HDFS. I have a couple of questions regarding the Serialization mechanisms used in Hadoop: 1. Does Hadoop provides a pluggible feature for Serialization for both the above cases? 2. Is Writable the default Serialization mechanism for both the above cases? 3. Were there any changes w.r.t. to Serialization from Hadoop 1.x to Hadoop 2.x? 4. Will there be a significant performance gain if the default Serialization i.e. Writables is replaced with Avro, Protol Buffers or Thrift in Map Reduce programming? Thanks, -RR
<<inline: image002.png>>