Seems like the right time to share some Parquet vs Avro knowledge haha :) My god, exactly what you said! Untyped List within a POJO, problem solved. BTW, it was using ReflectData.getSchema().
Thanks a lot Ryan! Really appreciated! El mar., 6 ago. 2019 a las 17:35, Ryan Skraba (<r...@skraba.com>) escribió: > Funny, I'm familiar with Avro, but I'm currently looking closely at > Parquet! > > Interestingly enough, I just ran across the conversion utilities in > Spark that could have answered your original question[1]. > > It looks like you're using ReflectData to get the schema. Is the > exception occurring during the ReflectData.getSchema() or .induce() ? > Can you share the full stack trace or better yet, the POJO that > reproduces the error? > > I _think_ I may have ran across something similar when getting a > schema via reflection, but the class had a raw collection field (List > instead of List<MyValue>). I can't clearly recall, but that might be > a useful hint. > > [1]: > https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L136 > > On Tue, Aug 6, 2019 at 2:39 PM Edgar H <kaotix...@gmail.com> wrote: > > > > Thanks a lot for the quick reply Ryan! That was exactly what I was > looking for :) > > > > Been trying including the changes within my code and currently it's > throwing the following exception... Caused by: > org.apache.avro.AvroRuntimeException: Can't find element type of Collection > > > > I'm thinking that it could be the POJO not containing the classes for > the inner record fields (I just have a getter and setter for the one_level > field but the rest are types of that one)? Or how should it be represented > within the parent POJO? > > > > Sorry if the questions sound too simple, but I'm too used to work with > Parquet that Avro seems like a shift from time to time :) > > > > El mar., 6 ago. 2019 a las 12:01, Ryan Skraba (<r...@skraba.com>) > escribió: > >> > >> Hello -- Avro supports a map type: > >> https://avro.apache.org/docs/1.9.0/spec.html#Maps > >> > >> Generating an Avro schema from a JSON example can be ambiguous since a > >> JSON object can either be converted to a record or a map. You're > >> probably looking for something like this: > >> > >> { > >> "type" : "record", > >> "name" : "MyClass", > >> "namespace" : "com.acme.avro", > >> "fields" : [ { > >> "name" : "one_level", > >> "type" : { > >> "type" : "record", > >> "name" : "one_level", > >> "fields" : [ { > >> "name" : "inner_level", > >> "type" : { > >> "type" : "map", > >> "values" : { > >> "type" : "record", > >> "name" : "sample", > >> "fields" : [ { > >> "name" : "sample1", > >> "type" : "string" > >> }, { > >> "name" : "sample2", > >> "type" : "string" > >> } ] > >> } > >> } > >> } ] > >> } > >> } ] > >> } > >> > >> On Tue, Aug 6, 2019 at 10:47 AM Edgar H <kaotix...@gmail.com> wrote: > >> > > >> > I'm trying to translate a schema that I have in Spark which is > defined for Parquet, and I would like to use it within Avro too. > >> > > >> > StructField("one_level", StructType(List(StructField( > >> > "inner_level", > >> > MapType( > >> > StringType, > >> > StructType( > >> > List( > >> > StructField("field1", StringType), > >> > StructField("field2", ArrayType(StringType)) > >> > ) > >> > ) > >> > ) > >> > ) > >> > )), nullable = false) > >> > > >> > However, in Avro I haven't seen any examples of Maps containing > Record type objects... > >> > > >> > Tried a sample input with an online Avro schema generator, taking > this input. > >> > > >> > { > >> > "one_level": { > >> > "inner_level": { > >> > "sample1": { > >> > "field1": "sample", > >> > "field2": ["a", "b"], > >> > }, > >> > "sample2": { > >> > "field1": "sample2", > >> > "field2": ["a", "b"] > >> > } > >> > } > >> > } > >> > > >> > } > >> > > >> > It prompts this output. > >> > > >> > { > >> > "name": "MyClass", > >> > "type": "record", > >> > "namespace": "com.acme.avro", > >> > "fields": [ > >> > { > >> > "name": "one_level", > >> > "type": { > >> > "name": "one_level", > >> > "type": "record", > >> > "fields": [ > >> > { > >> > "name": "inner_level", > >> > "type": { > >> > "name": "inner_level", > >> > "type": "record", > >> > "fields": [ > >> > { > >> > "name": "sample1", > >> > "type": { > >> > "name": "sample1", > >> > "type": "record", > >> > "fields": [ > >> > { > >> > "name": "field1", > >> > "type": "string" > >> > }, > >> > { > >> > "name": "field2", > >> > "type": { > >> > "type": "array", > >> > "items": "string" > >> > } > >> > } > >> > ] > >> > } > >> > }, > >> > { > >> > "name": "sample2", > >> > "type": { > >> > "name": "sample2", > >> > "type": "record", > >> > "fields": [ > >> > { > >> > "name": "field1", > >> > "type": "string" > >> > }, > >> > { > >> > "name": "field2", > >> > "type": { > >> > "type": "array", > >> > "items": "string" > >> > } > >> > } > >> > ] > >> > } > >> > } > >> > ] > >> > } > >> > } > >> > ] > >> > } > >> > } > >> > ] > >> > } > >> > > >> > Which isn't absolutely what I'm looking for. Is it possible to define > such schema in Avro? >