Hey, Since you don't have control over the third party system, you can't ensure that the data is as per your expectation when it arrives. But you can build a data processing layer, to clean the raw data once it has arrived in the warehouse, and store it in Avro with your own schema definition. When you want to add a new field which has started flowing in / delete a field / rename it, you can update the schema in a backward compatible manner (+ update the processing layer) and be assured that none of the downstream systems will break.
Thanks, Akshay Aggarwal On Tue, May 10, 2016 at 6:28 AM Sean Busbey <bus...@cloudera.com> wrote: > On Mon, May 9, 2016 at 12:21 PM, Koert Kuipers <ko...@tresata.com> wrote: > > you cannot use avro to ensure the data comes in the format you expect > (the > > negative numbers issue). you will have to parse these variations before > > converting to avro. > > Unless, of course, you can get the folks sending you data to agree to > send it in Avro. If you specifically get them to send the numbers > coded as one of the number types in Avro (rather than i.e. a string), > you'd be able to parse it the same way all of the time. > > > > > -- > busbey >