Not 100% on this --- but I'm pretty sure the only other thing you need to
take into consideration is the schema.  The avro schema is sometimes
located at the beginning of the container (or external).  If you expect it
at the beginning of the container and are using it to "introspect" an avro
file, then splitting it could be problematic for consumer code.  If you
plan on splitting it, than it's likely best to manage the schema externally
to the container.

On Fri, Jun 26, 2015 at 10:11 AM, Sean Busbey <bus...@cloudera.com> wrote:

> Avro Container Files are always splittable[1]. They're the way you will
> commonly interact with Avro serialized data.
>
> Data serialized as Avro's binary encoding is not splittable by itself,
> because the encoding includes no markers[2]. This may be the source of the
> disconnect you're finding in online docs.
>
>
>
> [1]: http://avro.apache.org/docs/1.7.7/spec.html#Object+Container+Files
> [2]: http://avro.apache.org/docs/1.7.7/spec.html#Data+Serialization
>
> On Thu, Jun 25, 2015 at 12:54 AM, Ankur Jain <ankur.j...@yash.com> wrote:
>
>>  Hello,
>>
>>
>>
>> I am reading various forms and docs, somewhere it is mentioned that avro
>> is splittable and somewhere non-splittable.
>>
>> So which one is right??
>>
>>
>>
>> Regards,
>>
>> Ankur
>>
>>
>>  Information transmitted by this e-mail is proprietary to YASH
>> Technologies and/ or its Customers and is intended for use only by the
>> individual or entity to which it is addressed, and may contain information
>> that is privileged, confidential or exempt from disclosure under applicable
>> law. If you are not the intended recipient or it appears that this mail has
>> been forwarded to you without proper authority, you are notified that any
>> use or dissemination of this information in any manner is strictly
>> prohibited. In such cases, please notify us immediately at i...@yash.com
>> and delete this mail from your records.
>>
>
>
>
> --
> Sean
>

Reply via email to