I have used Avro as the storage format for the Network probes data in my
current company  Optus (Telco) in Australia.
The probes data is in hexadecimal form and its decoded and stored in Avro
with snappy compression.
The solution was developed in 2014 and is being used till date.
We also used it to transfer files between various systems/platforms through
smtp
At that point of time, avro was a perfect choice because:
1) being a row based format its best for transactional accuracy
2) Great compression with snappy hence savings on storage.
3) Supported most of the complex datatypes such as structure, enums, arrays
etc so the structure was 100% aligned to the source but was in a human
readable form using hive (through AvroSerde)

Thanks,
Vikas

On Wed, Jan 27, 2021 at 8:27 PM Lee Hambley <lee.hamb...@gmail.com> wrote:

> Thanks for the extensive response! I think a lot of what you are saying
>> is very spot on!
>>
>
> I hope it's useful, if and when your paper is published, I'd love to give
> it a read.
>
>
>> I'm working on a paper surveying ~13 binary schema-less and
>> schema-driven serialization formats (including Avro) that can handle any
>> data structure that JSON can represent. Therefore, I was particularly
>> interested on why you wanted to convert JSON Schema to Avro IDL.
>>
>
> So GraphQL's IDL (interface definition language) isn't quite a JSON
> Schema, but the responses are often represented as JSON, but it's not JSON
> Schema per-se.
>
> For use the use-case is very different, even if Avro and GraphQL's IDLs
> could _almost_ be losslessly interchanged at some level, they both have a
> decent type system, they both allow definition of RPC services, one is a
> great candidate for public APIs (GraphQL has "directives", and really nice
> annotation and documentation generation tools), and Avro is ideal for our
> internal APIs. An Avro payload for us runs ~12-30 bytes, where JSON would
> be at least 2-3x the size (we send a lot of very small messages, very
> similar ones, so JSON reserializing the keys every time would kill us). So
> Avro gives us something nothing JSON oriented can. Also, we use Avro as our
> archival format using the
> https://avro.apache.org/docs/current/spec.html#Object+Container+Files
> which I believe is also sort-of unique.
>
>
>> Is JSON Schema the ubiquitous "contract" language that you are using in
>> your company, so you want to keep it as the source of truth while also
>> being able to work with Avro?
>>
>
> It's just public vs. private (or, internal) APIs, and being deliberate
> about storing those IDL files in separate repositories and training teams
> to get into the habit of planning and co-designing changes to these
> ubiquitous contracts before they need to do implementation work, since
> changing the contract affects everyone. (be that some "near realtime" RPC
> service that is in the hot path of customer requests  on the web API, or
> whether that's offline processing by our BI teams who are running reports
> based on the archived data from the datawarehouse)
>
> The company just went through explosive growth, so, whilst we
> adopted/inherited Avro as part of adopting JVM/Akka stuff for some parts of
> the infra, the pivot to nominate these IDLs as the point at which teams
> have to synchronize and coordinate is still something we are building out.
>
>
>> On Tue, Jan 26, 2021 at 04:38:18PM +0100, Lee Hambley wrote:
>> > I would say that in general, being around the industry for 15 or so
>> years
>> > now, that there has been a definite uptake in these binary protocols.
>> >
>> > If I had to speculate, I'd that that outside a few niches, the ASN.1 and
>> > similar protocols never *really* took-off outside telecoms, which is
>> > regrettable because they are really fantastic protocols (they are used
>> > extensively in certificates, DER/PEM are in the ASN.1 family of things,
>> SSL
>> > certs are all ASN.1 encoded, usually, etc.)
>> >
>> > These days seems like everyone has some "big data" pipe, and having
>> > Hadoop/Spark/etc has become the must-have thing in most SMEs, so you
>> > inherit some of these things by "accident".
>> >
>> > I personally come from the event-sourcing, CQRS, domain-driven-design
>> > circles, here having a ubiquitous language "contract", preferably a
>> > bullet-proof one with good change management tooling is something that
>> you
>> > explicitly go looking for. In that sphere you come across msgpack,
>> > capnproto, protobufs, thrift, etc which all offer insane performance,
>> very
>> > compact payloads, but Avro is unique in offering something like a schema
>> > registry and concrete guarantees about rolling coordinating deploys with
>> > between producers and consumers (note: I _think_ protobufs got something
>> > like a schema registry now, but I never used it)
>> >
>> > Another increasingly good option for this in the "SDL" (schema
>> definition
>> > language) spec space is GraphQL which isn't a _binary_ packing format,
>> but
>> > does offer a standalone schema definition language for defining service
>> > contracts. Whilst Avro does account for RPC protocols
>> > <https://avro.apache.org/docs/current/spec.html#Protocol+Declaration>,
>> I
>> > haven't really seen that used so much in the wild, but maybe that's
>> just my
>> > "bubble" speaking. GraphQL doesn't *really* have the schema migration
>> tools
>> > that Avro has, but at least when dealing with GraphQL payloads, most
>> > language implementations give you the underlying syntax tree for the
>> > payload, so it's a bit easier to see what clients are requesting and
>> what
>> > fields need various levels of scrutiny before being changed.
>> >
>> > Anyway, probably nothing of this is really interesting to your paper,
>> but I
>> > never miss a good opportunity to share unsolicited opinions :D
>> >
>> > Lee Hambley
>> > http://lee.hambley.name/
>> > +49 (0) 170 298 5667
>> >
>> >
>> > On Tue, 26 Jan 2021 at 16:27, Juan Cruz Viotti <j...@jviotti.com> wrote:
>> >
>> > > > I don't mean to make light of your question, just to point out that
>> I
>> > > > don't think many companies are proudly announcing to the world that
>> > > > they use Avro... why would they?
>> > >
>> > > Indeed, I totally agree. I'm writing a research paper involving Apache
>> > > Avro and just wanted to enrich the historical sections a bit with some
>> > > industry usage information!
>> > >
>> > > On Mon, Jan 25, 2021 at 10:40:31PM +0100, Lee Hambley wrote:
>> > > > I work for two companies using Avro (contractor, I won't name them)
>> but I
>> > > > don't know what good it serves anyone knowing that we use them.
>> Would you
>> > > > ask the same question about JSON, or XML, or whether we use nginx or
>> > > > apache?
>> > > >
>> > > > Avro is one of about 5 components in the distributed messaging
>> > > > architectures, and aside that is is very nicely designed (I believe
>> the
>> > > > schema versioning and rigorously documented canonical forms are an
>> almost
>> > > > unique point of attraction)
>> > > >
>> > > > I don't mean to make light of your question, just to point out that
>> I
>> > > don't
>> > > > think many companies are proudly announcing to the world that they
>> use
>> > > > Avro... why would they?
>> > > >
>> > > > Lee Hambley
>> > > > http://lee.hambley.name/
>> > > > +49 (0) 170 298 5667
>> > > >
>> > > >
>> > > > On Mon, 25 Jan 2021 at 22:30, M. Manna <manme...@gmail.com> wrote:
>> > > >
>> > > > >
>> > > > > I believe Confluent and Imply are the two companies I know of.
>> > > > >
>> > > > >
>> > > > > On Mon, 25 Jan 2021 at 20:28, Juan Cruz Viotti <j...@jviotti.com>
>> wrote:
>> > > > >
>> > > > >> Hey there!
>> > > > >>
>> > > > >> Do you know where can I find a list of relatively well-known
>> companies
>> > > > >> that make use of Apache Avro? I'm trying to collect a small list
>> for
>> > > > >> research purposes and my search is not yielding many results
>> apart
>> > > from
>> > > > >> Facebook.
>> > > > >>
>> > > > >> Thanks in advance,
>> > > > >>
>> > > > >> --
>> > > > >> Juan Cruz Viotti
>> > > > >> Software Engineer
>> > > > >> https://www.jviotti.com
>> > > > >>
>> > > > >
>> > >
>> > > --
>> > > Juan Cruz Viotti
>> > > Software Engineer
>> > > https://www.jviotti.com
>> > >
>>
>> --
>> Juan Cruz Viotti
>> Software Engineer
>> https://www.jviotti.com
>>
>

-- 
Thanks and regards,
Vikas Saxena.

Reply via email to