Builders have some inherent overheads.  Things could be optimized to better
minimize this, but it will likely always be faster to reuse a single
instance when writing.

The deepCopy's are probably of the default values of each field you're not
setting.  If you're only setting a few fields then you might use a builder
to create a single instance so its defaults are set, then reuse that
instance as you write, setting only those few fields you need to differ
from the default.  (This only works if you're setting the same fields every
time.  Otherwise you'd need to restore the default value.)

An optimization for Avro here might be to inline default values for
immutable types when generating the build() method.

Doug

On Fri, Jan 26, 2018 at 9:04 AM, Nishanth S <nishanth.2...@gmail.com> wrote:

> Hello Every One,
>
> We have a process that reads data from a  local file share  ,serailizes
> and writes to HDFS in avro format. .I am just wondering if I am building
> the avro objects correctly. For every record that  is read from the binary
> file we create an equivalent avro object in the below format.
>
> Parent p = new Parent();
> LOGHDR hdr = LOGHDR.newBuilder().build()
> MSGHDR msg = MSGHDR.newBuilder().build()
> p.setHdr(hdr);
> p.setMsg(msg);
> p..
> p..set
> datumFileWriter.write(p);
>
> This avro schema has  around 1800 fileds including 26 nested types within
> it .I did some load testing and figured that if I serialize the same object
> to disk  performance is  6 x times faster  than a constructing a new object
> (p.build). When a new  avro object is constructed everytime using
> RecordBuilder.build()  much of the time is spend in
> GenericData.deepCopy().Has any one run into a similar problem ? We are
> using Avro 1.8.2.
>
> Thanks,
> Nishanth
>
>
>
>
>

Reply via email to