Hey,

Since you don't have control over the third party system, you can't ensure
that the data is as per your expectation when it arrives. But you can build
a data processing layer, to clean the raw data once it has arrived in the
warehouse, and store it in Avro with your own schema definition. When you
want to add a new field which has started flowing in / delete a field /
rename it, you can update the schema in a backward compatible manner (+
update the processing layer) and be assured that none of the downstream
systems will break.

Thanks,
Akshay Aggarwal

On Tue, May 10, 2016 at 6:28 AM Sean Busbey <bus...@cloudera.com> wrote:

> On Mon, May 9, 2016 at 12:21 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > you cannot use avro to ensure the data comes in the format you expect
> (the
> > negative numbers issue). you will have to parse these variations before
> > converting to avro.
>
> Unless, of course, you can get the folks sending you data to agree to
> send it in Avro. If you specifically get them to send the numbers
> coded as one of the number types in Avro (rather than i.e. a string),
> you'd be able to parse it the same way all of the time.
>
>
>
>
> --
> busbey
>

Reply via email to