Hi folks, a high level question. Say we have readers and writers in different projects. The writer project dumps some data in some directory (or stores in a common store, etc) and the reader project picks up that data and uses its reader schema and the published writer schema (say we have a way to ship writer schemas along with the dataset).
In that kind of setup where reader and writer schemas change at their own rate, and are their own projects, and they are going to ship data over the wire, how do you compare using SpecificRecords vs GenericRecords? 1. At what point would the reader project be forced to re-generate their Specific records from schemas? Every time writer schema changes in any way? every time a new field is added in the writer schema? When schema evolution support is critical and we have multiple projects writing and reading data over the wire, is the static typing provided by SpecificRecord going to be a bottleneck or is that not going to be a concern regardless of Generic or Specific Record? 2. In terms of efficiency and performance, have you noticed one performing better than the other in terms of serialized/deserialized storage space and cpu utilization? We are interested in using Specific records because it offers static compile time checks and ensures we are writing code to the correct field names and datatypes and such but would like to hear from the community what your thoughts are on this. Thanks! -- Arvind Kalyan