Hello All,
AFAIK Hadoop serialization comes into picture in the 2 areas:putting data on
the wire i.e., for interprocess communication between nodes using RPCputting
data on disk i.e. using the Map Reduce for persistent storage say HDFS.
I have a couple of questions regarding the Serialization mechanisms used in
Hadoop:
Does Hadoop provides a pluggible feature for Serialization for both the above
cases?Is Writable the default Serialization mechanism for both the above
cases?Were there any changes w.r.t. to Serialization from Hadoop 1.x to Hadoop
2.x?Will there be a significant performance gain if the default Serialization
i.e. Writables is replaced with Avro, Protol Buffers or Thrift in Map Reduce
programming?
Thanks,-RR