Re: Avro JSON Encoding

2024-04-19 Thread Ryan Skraba
Hello! A bit tongue in cheek: the one advantage of the current Avro JSON encoding is that it drives users rapidly to prefer the binary encoding! In its current state, Avro isn't really a satisfactory toolkit for JSON interoperability, while it shines for binary interoperability. Using JSON with

CVE-2023-39410: Apache Avro Java SDK: Memory when deserializing untrusted data in Avro Java SDK

2023-09-29 Thread Ryan Skraba
Severity: low Affected versions: - Apache Avro Java SDK before 1.11.3 Description: When deserializing untrusted or corrupted data, it is possible for a reader to consume memory beyond the allowed constraints and thus lead to out of memory on the system. This issue affects Java applications

[ANNOUNCE] Apache Avro 1.11.3 released

2023-09-26 Thread Ryan Skraba
The Apache Avro community is pleased to announce the release of Avro 1.11.3! All signed release artifacts, signatures and verification instructions can be found here: https://avro.apache.org/releases.html This is a minor release, specifically addressing known issues with the 1.11.2 release, but

Re: EOS/EOL Date

2023-07-17 Thread Ryan Skraba
Hello! While Avro doesn't have an official "end-of-life" statement or policy, there is no active development on the 1.9 or 1.10 branch. Our current policy is to add major features to the next major release (1.12.0) while bug fixes, CVEs and minor features will be backported to the next minor

[ANNOUNCE] Apache Avro 1.11.2 released

2023-07-11 Thread Ryan Skraba
The Apache Avro community is pleased to announce the release of Avro 1.11.2! All signed release artifacts, signatures and verification instructions can be found here: https://avro.apache.org/releases.html This release addresses ~89 Avro JIRA, including some interesting highlights: C# -

Re: Modifying a field's schema property in Java

2022-11-23 Thread Ryan Skraba
Thanks Oscar! Julien (or anyone else) -- do you think it would be useful to have a category of "Schema" objects that are mutable for the Java SDK? Something like: MutableSchema ms = originalSchema.unlock(); ms.getField("quantity").setProperty("precision", 5);

[ANNOUNCE] Apache Avro 1.11.1 released

2022-08-08 Thread Ryan Skraba
The Apache Avro community is pleased to announce the release of Avro 1.11.0! All signed release artifacts, signatures and verification instructions can be found here: https://avro.apache.org/releases.html This release includes ~250 Jira issues, including some interesting features: Some

Re: Reflection Based Serializer on an Interface

2022-02-15 Thread Ryan Skraba
Hello! I was hoping someone has better news, but I'm afraid there's a couple of constraints in using interfaces with ReflectData. My recommendation would be to create a Schema from your actual concrete implementation, and drop it onto your interface with an @AvroSchema annotation. It's not

Re: Gigantic list of Avro spec issues

2022-02-14 Thread Ryan Skraba
Hello! Thanks, Dan, for your calm and measured response -- you've given some excellent advice on how someone can make a positive contribution to the project and the spec. Askar, your approach in presenting your specification review should have been more constructive: it isn't useful to

Re: Avro Maven Plugin for Shared Avro library

2022-01-24 Thread Ryan Skraba
Hello, As you note, It currently isn't possible to "pre-shade" Avro, but I completely understand why you might want to do it! Shading avro is a common thing to do (see https://beam.apache.org/documentation/io/built-in/parquet/ for example). I guess we _might_ be able to fiddle with the maven

Re: UUID Logical type not working in Java

2022-01-18 Thread Ryan Skraba
Hello! This is a known issue in 1.11.0 and before, and has recently been fixed. There's some discussion at AVRO-2498. The bad news is that if you're using avro-tools to generate your code, there isn't any released version yet that contains the fix. Martin sent the link to the SNAPSHOT

Re: Papers discussing Apache Avro

2022-01-14 Thread Ryan Skraba
This is really cool news -- it's always really interesting to see benchmark studies and the trade-offs we make while choosing different formats. Thanks for sharing! I'd love to see links to some curated articles and papers on the website! I created AVRO-3308 if you don't object :D Ryan On

Re: New website

2021-12-12 Thread Ryan Skraba
Hello! I realized that I haven't commented on this mailing list thread -- I made some comments on https://issues.apache.org/jira/browse/AVRO-2175 This looks amazing and we should merge it very soon :D It's not perfect, but it's really a great improvement and definitely not worst than the

[ANNOUNCE] Apache Avro 1.11.0 released

2021-10-31 Thread Ryan Skraba
The Apache Avro community is pleased to announce the release of Avro 1.11.0! All signed release artifacts, signatures and verification instructions can be found here: https://avro.apache.org/releases.html This release includes 120 Jira issues, including some interesting features: Specification:

Re: android support in avro java libraries

2021-08-12 Thread Ryan Skraba
mockito/mockito/issues/2341 The regression test could be >> run on emulators with different versions (or maybe robolectrics) and so >> errors could be catched. >> >> Hope this answers roughly the questions. >> >> Thanks, >> >> >> >> On Wed, Aug 11,

Re: Issue with ReflectDatumWriter With Enums

2021-06-30 Thread Ryan Skraba
Hello! I'm pretty sure that I've used enums with implementations and ReflectData successfully, even with old versions of Avro. It seems to work with 1.9.x+ with the following ReflectDatumWriter (where datum is an instance of the TestEnum): Encoder encoder =

Re: Setting a null value to field with default value

2021-03-24 Thread Ryan Skraba
Hello! I can reproduce it in Avro 1.10.2, and I think this is a bug. I raised https://issues.apache.org/jira/browse/AVRO-3091 to track it. Thanks so much for the full example! It looks like the workaround is to validate the defId in your own code before building. Until this is fixed, the

[ANNOUNCE] Apache Avro 1.10.2 released

2021-03-17 Thread Ryan Skraba
The Apache Avro community is pleased to announce the release of Avro 1.10.2! All signed release artifacts, signatures and verification instructions can be found here: https://avro.apache.org/releases.html This release includes 31 Jira issues, including some interesting features: C#: AVRO-3005

Re: Avro Java - Validation of GenericRecord question

2021-01-05 Thread Ryan Skraba
Hello! As you noticed, the validate method deliberately ignores the actual schema of a record datum, and validates the field values by position. It's answering a slightly different question --> whether the datum (and it's contents) could fit in the given schema. For your use case, you might

[ANNOUNCE] Apache Avro 1.10.1 released

2020-12-04 Thread Ryan Skraba
Please note: I mistakenly sent this same message earlier today with the wrong subject! It is, in fact, 1.10.1 that was released. My apologies! The Apache Avro community is pleased to announce the release of Avro 1.10.1! All signed release artifacts, signatures and verification instructions can

Re: Failure in writing BigDecimal as decimal logical type

2020-10-07 Thread Ryan Skraba
the full > context but this certainly was not trivial to add DecimalConversion > manually. > > Thanks again for your help. > > -B > > On Tue, Oct 6, 2020 at 10:56 AM Ryan Skraba wrote: > >> Hello! This is a frequent stumbling block for logical types. >> >> Y

Re: Failure in writing BigDecimal as decimal logical type

2020-10-06 Thread Ryan Skraba
Hello! This is a frequent stumbling block for logical types. You should explicitly add the Decimal logical type conversion to the data model that interprets the Java datum being serialized to your file, like this: GenericData model = new GenericData(); model.addLogicalTypeConversion(new

Re: working with Avro records and schemas, programmatically

2020-09-18 Thread Ryan Skraba
Hello Colin, you've hit one bit of fussiness with the Java SDK... you can't reuse a Schema.Field object in two Records, because a field knows its own position in the record[1]. If a field were to belong to two records at different positions, this method would have an ambiguous response. As a

Re: AvroTypeException: Attempt to process a double when a string was expected

2020-08-03 Thread Ryan Skraba
Hello! Thanks for the MCVE -- I could reproduce your symptoms easily! Even when you're using JSON encoding, you should use a GenericDatumReader<> to read generic datum. The Json.ObjectReader sounds correct but is actually for a different JSON use case (storing any arbitrary JSON snippet in a

Re: Counting bytes read

2020-07-29 Thread Ryan Skraba
Hi, You've got it right: the DataFileReader and DataFileStream read a block at a time, and "fileReader.tell()" sits at the sync marker between blocks while records are being read from the current block. You're probably aware that DataFileReader is only seekable to block boundaries. The entire

Re: Logo

2020-05-12 Thread Ryan Skraba
Hello! There's a policy for trademarks, service marks, and graphic logos at https://www.apache.org/foundation/marks/ It sounds like you're using the Apache Avro logo to refer to the Apache Avro project (see "nominative use" at the first link above), which is usually OK. There are some additional

Re: Decimal type, limitation on scale

2020-03-03 Thread Ryan Skraba
It looks like the "scale must be less than precision" rule comes from Hive requirements[1] (although while searching, this is called into question elsewhere in Hive[2]). From the design document, the requirement was specifically to avoid variable (per-row scale): > For instance, applications

[ANNOUNCE] Apache Avro 1.9.2 released

2020-02-13 Thread Ryan Skraba
/gems/avro/versions/1.9.2 Thanks to everyone for contributing! Ryan Skraba

Re: How to serialize & deserialize contiguous block of GenericRecords

2020-01-30 Thread Ryan Skraba
> <https://www.facebook.com/Feedzai/>[image: > Follow Feedzai on Twitter!] <https://twitter.com/feedzai>[image: Connect > with Feedzai on LinkedIn!] <https://www.linkedin.com/company/feedzai/> > <https://feedzai.com/>[image: Feedzai in Forbes Fintech 50!] > &

Re: How to serialize & deserialize contiguous block of GenericRecords

2020-01-29 Thread Ryan Skraba
Hello! It's a bit difficult to discover what's going wrong -- I'm not sure that the code in the image corresponds to the exception you are encountering! Notably, there's no reference to DataFileStream... Typically, it would be easier with code as TXT than as PNG! It is definitely possible to

Re: avro-tools illegal reflective access warnings

2020-01-17 Thread Ryan Skraba
On Fri, Jan 17, 2020 at 2:22 PM roger peppe wrote: > > On Thu, 16 Jan 2020 at 17:21, Ryan Skraba wrote: >> >> didn't find anything currently in the avro-tools that uses both >> reader and writer schemas while deserializing data... It should be a >> pretty ea

Re: avro-tools illegal reflective access warnings

2020-01-16 Thread Ryan Skraba
gt;> >> >> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba wrote: >>> >>> Hello! Is it because you are using brew to install avro-tools? I'm >>> not entirely familiar with how it packages the command, but using a >>> direct bash-like solution ins

Re: avro-tools illegal reflective access warnings

2020-01-16 Thread Ryan Skraba
Hello! Is it because you are using brew to install avro-tools? I'm not entirely familiar with how it packages the command, but using a direct bash-like solution instead might solve this problem of mixing stdout and stderr. This could be the simplest (and right) solution for piping. alias

Re: name-agnostic schema resolution (a.k.a. structural subtyping?)

2019-12-19 Thread Ryan Skraba
Hello! You might be interested in this short discussion on the dev@ mailing list: https://lists.apache.org/x/thread.html/dd7a23c303ef045c124050d7eac13356b20551a6a663a79cb8807f41@%3Cdev.avro.apache.org%3E In short, it appears that the record name is already ignored in record-to-record matching

Re: New Committer: Ryan Skraba

2019-12-17 Thread Ryan Skraba
gt; > Austin > > On Tue, Dec 17, 2019 at 7:13 AM Michael Burr wrote: >> >> unsubscribe >> >> On Tue, Dec 17, 2019 at 4:43 AM Driesprong, Fokko >> wrote: >>> >>> Folks, >>> >>> The Project Management Committee (PMC) for Apache

Re: records with without fields?

2019-12-17 Thread Ryan Skraba
Related to the earlier question in the thread: there's one good starting point for a language-agnostic set of test schemas here: https://github.com/apache/avro/blob/master/share/test/data/schema-tests.txt#L24 There's a LOT of other schemas scattered throughout the project and languages, of

Re: records with without fields?

2019-12-13 Thread Ryan Skraba
I think the spec is OK with it. We've even used it in the Java API (to refer to a table had been created but had no columns yet). It's not *extremely* useful even as a starting point to add schema evolutions, but maybe as a technique for forcing different Parsing Canonical Forms for otherwise

Re: Resolving a possible specification inconsistency pertaining to the doc attribute

2019-12-10 Thread Ryan Skraba
finitely not a strict as it could be, but I've found it > useful, and it has lots of room for improvement. > > cheers, > rog. > > > > On Fri, 6 Dec 2019 at 17:43, Jonah H. Harris wrote: >> >> On Fri, Dec 6, 2019 at 12:16 PM Ryan Skraba wrote: >>> >

Re: defaults for complex types (was Re: recursive types)

2019-12-06 Thread Ryan Skraba
Hello! I had a Java unit test ready to go (looking at default values for complex types for AVRO-2636), so just reporting back (the easy work!): 1. In Java, the schema above is parsed without error, but when attempting to use the default value, it fails with a NullPointerException (trying to

Re: Avro schema having Map of Records

2019-08-06 Thread Ryan Skraba
arquet that Avro seems like a shift from time to time :) > > El mar., 6 ago. 2019 a las 12:01, Ryan Skraba () escribió: >> >> Hello -- Avro supports a map type: >> https://avro.apache.org/docs/1.9.0/spec.html#Maps >> >> Generating an Avro schema from a JSON example can

Re: Avro schema having Map of Records

2019-08-06 Thread Ryan Skraba
Hello -- Avro supports a map type: https://avro.apache.org/docs/1.9.0/spec.html#Maps Generating an Avro schema from a JSON example can be ambiguous since a JSON object can either be converted to a record or a map. You're probably looking for something like this: { "type" : "record", "name"

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-08-02 Thread Ryan Skraba
t uses int32 as schema Id. This is >>>>> prepended (+a magic byte) to the binary avro. Thus using the schema >>>>> registry again you can get the writer schema. >>>>> >>>>> /Svante >>>>> >>>>> On T

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-07-30 Thread Ryan Skraba
oder(byteArrayOutputStream, null) > : EncoderFactory.get().jsonEncoder(schema, > byteArrayOutputStream, pretty); > > DatumWriter datumWriter = new > GenericDatumWriter<>(schema); > datumWriter.write(data, binaryEncoder); > > binaryEnc

Re: AVRO schema evolution: adding optional column with default fails deserialization

2019-07-30 Thread Ryan Skraba
Hello! Schema evolution relies on both the writer and reader schemas being available. It looks like the allegro tool you are using is using the GenericDatumReader that assumes the reader and writer schema are the same:

Re: Reg: Avrojob schema validation option.

2019-07-30 Thread Ryan Skraba
Hello! I'm not sure I understand your question. Some names are *required* with a specific format in the Avro specification (http://avro.apache.org/docs/1.8.2/spec.html#names) What are you looking to accomplish? I can think of two scenarios that we've seen in the past: (1) anonymous records

Re: Should a Schema be serializable in Java?

2019-07-18 Thread Ryan Skraba
> there. > > > On Tue, Jul 16, 2019 at 11:16 AM Ryan Skraba wrote: > > > > Hello! Thanks to the reference to AVRO-1852. It's exactly what I was > looking for. > > > > I agree that Java serialization shouldn't be used for anything > cross-platform, or (in

Re: Should a Schema be serializable in Java?

2019-07-16 Thread Ryan Skraba
ma 15 jul. 2019 om 23:03 schreef Doug Cutting : > > > > > I can't think of a reason Schema should not implement Serializable. > > > > > > There's actually already an issue & patch for this: > > > > > > https://issues.apache.org/jira/browse/AVRO-1852 &g

Should a Schema be serializable in Java?

2019-07-15 Thread Ryan Skraba
Hello! I'm looking for any discussion or reference why the Schema object isn't serializable -- I'm pretty sure this must have already been discussed (but the keywords +avro +serializable +schema have MANY results in all the searches I did: JIRA, stack overflow, mailing list, web) In particular,