Builders have some inherent overheads. Things could be optimized to better minimize this, but it will likely always be faster to reuse a single instance when writing.
The deepCopy's are probably of the default values of each field you're not setting. If you're only setting a few fields then you might use a builder to create a single instance so its defaults are set, then reuse that instance as you write, setting only those few fields you need to differ from the default. (This only works if you're setting the same fields every time. Otherwise you'd need to restore the default value.) An optimization for Avro here might be to inline default values for immutable types when generating the build() method. Doug On Fri, Jan 26, 2018 at 9:04 AM, Nishanth S <nishanth.2...@gmail.com> wrote: > Hello Every One, > > We have a process that reads data from a local file share ,serailizes > and writes to HDFS in avro format. .I am just wondering if I am building > the avro objects correctly. For every record that is read from the binary > file we create an equivalent avro object in the below format. > > Parent p = new Parent(); > LOGHDR hdr = LOGHDR.newBuilder().build() > MSGHDR msg = MSGHDR.newBuilder().build() > p.setHdr(hdr); > p.setMsg(msg); > p.. > p..set > datumFileWriter.write(p); > > This avro schema has around 1800 fileds including 26 nested types within > it .I did some load testing and figured that if I serialize the same object > to disk performance is 6 x times faster than a constructing a new object > (p.build). When a new avro object is constructed everytime using > RecordBuilder.build() much of the time is spend in > GenericData.deepCopy().Has any one run into a similar problem ? We are > using Avro 1.8.2. > > Thanks, > Nishanth > > > > >