@Andy Adding these constraints at the schema level prevents bad data from
making it onto Kafka topics in the first place, preventing data pollution.
I don't know what you mean by "making it harder to write data using that
schema"--imposing and enforcing constraints is kind of the point.

> Why not just handle the empty case where you consume the data?

That's what we currently do, but we wouldn't have to have this extra test
case if we could impose the aforementioned constraint at the schema level.

Right now, we treat messages with an empty array as erroneous, and output a
corresponding message onto an error topic. If we reset our application and
consumed messages again, we'd be putting new messages onto the error topic,
*doubling* the unwanted data.

@Joseph That's an interesting approach. I know that Avro is extensible, but
we're relying on some third-party serde classes, and as @Andy mentions,
once you start getting into the weeds all bets are off.

On 11 May 2017 at 09:48, Andy Chambers <achambers.h...@gmail.com> wrote:

> I think the question you need to ask/answer is what is there to gain by
> adding this constraint. (This goes for any writer constraint)
>
> Each constraint you add makes it harder to write data using that schema.
>
> Why not just handle the empty case where you consume the data?
>
> Once you start adding custom datum writers, all bets are off with respect
> to schema compatibility so if you're using/trusting something like the
> confluent schema registry you're in trouble.
>
> On 11 May 2017 4:35 pm, "Joseph P." <joseph.pac...@gmail.com> wrote:
>
> Hi
>
> You can add prop to your avro schema.
>
> So here we have added our custo props and extra processing before
> generating the avro binary to make sure these props are respected.
>
> Pro : very flexible (we have added max_length on string, temporal_format
> and so forth...).
> Cons : you must be sure to have your extra processing running before
> generating the avro binaries
>
> For example in your case you could add a prop "nonEmpty" with default
> value to false.
>
> Then, before converting the Avro Json/Pojo to Avro binary, you use your
> own SpecificDatumWriter (extending SpecificDatumWriter) and then in
> writeField you check for the presence of the prop, its value, and if true
> you check for non emptiness.
>
> Cheers
>
>
> On Wed, May 10, 2017 at 10:41 AM, Tianxiang Xiong <
> tianxiang.xi...@fundingcircle.com> wrote:
>
>> Thanks Suraj, but that's not what I mean.
>>
>> For your second schema, it is possible to pass in an empty array `[]`
>> containing no elements. I would like to prevent that.
>>
>> On 8 May 2017 at 19:32, Suraj Acharya <su...@apache.org> wrote:
>>
>>> This is what I have done in my application :
>>>
>>> {"name": "clients", "type": [ {"type": "array", "items": "Client"}, "null" 
>>> ]}
>>>
>>> This allows me to pass null. What you can try is something like this :
>>>
>>> {"name": "info", "type": { "type": "array", "items": "Information" }
>>>
>>> In this example, info is something that needs to be passed for every
>>> client.
>>>
>>> Hope that helps.
>>>
>>>
>>> On Fri, May 5, 2017 at 9:51 PM, Tianxiang Xiong <
>>> tianxiang.xi...@fundingcircle.com> wrote:
>>>
>>>> In Avro 1.7.7, is there a way to specify a *non-empty* array, map,
>>>> etc.? There doesn't seem to be according to the spec
>>>> <https://avro.apache.org/docs/1.7.7/spec.html#Maps>.
>>>>
>>>> There are applications in which we mandate that a data format has a
>>>> non-empty array. It'd be nice if that could be expressed in the schema so
>>>> data with nonempty arrays fail to serialize (and are thus never put on a
>>>> Kafka topic). Fail earlier > fail later.
>>>>
>>>> Thanks,
>>>>
>>>> TX
>>>>
>>>
>>>
>>
>>
>> --
>>
>> *Tianxiang Xiong*
>>
>> *tianxiang.xi...@fundingcircle.com <tianxiang.xi...@fundingcircle.com>*
>>
>> 747 Front Street, Floor 4 | San Francisco, CA 94111
>>
>
>
>


-- 

*Tianxiang Xiong*

*tianxiang.xi...@fundingcircle.com <tianxiang.xi...@fundingcircle.com>*

747 Front Street, Floor 4 | San Francisco, CA 94111

Reply via email to