Funny, I'm familiar with Avro, but I'm currently looking closely at Parquet!
Interestingly enough, I just ran across the conversion utilities in Spark that could have answered your original question[1]. It looks like you're using ReflectData to get the schema. Is the exception occurring during the ReflectData.getSchema() or .induce() ? Can you share the full stack trace or better yet, the POJO that reproduces the error? I _think_ I may have ran across something similar when getting a schema via reflection, but the class had a raw collection field (List instead of List<MyValue>). I can't clearly recall, but that might be a useful hint. [1]: https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L136 On Tue, Aug 6, 2019 at 2:39 PM Edgar H <kaotix...@gmail.com> wrote: > > Thanks a lot for the quick reply Ryan! That was exactly what I was looking > for :) > > Been trying including the changes within my code and currently it's throwing > the following exception... Caused by: org.apache.avro.AvroRuntimeException: > Can't find element type of Collection > > I'm thinking that it could be the POJO not containing the classes for the > inner record fields (I just have a getter and setter for the one_level field > but the rest are types of that one)? Or how should it be represented within > the parent POJO? > > Sorry if the questions sound too simple, but I'm too used to work with > Parquet that Avro seems like a shift from time to time :) > > El mar., 6 ago. 2019 a las 12:01, Ryan Skraba (<r...@skraba.com>) escribió: >> >> Hello -- Avro supports a map type: >> https://avro.apache.org/docs/1.9.0/spec.html#Maps >> >> Generating an Avro schema from a JSON example can be ambiguous since a >> JSON object can either be converted to a record or a map. You're >> probably looking for something like this: >> >> { >> "type" : "record", >> "name" : "MyClass", >> "namespace" : "com.acme.avro", >> "fields" : [ { >> "name" : "one_level", >> "type" : { >> "type" : "record", >> "name" : "one_level", >> "fields" : [ { >> "name" : "inner_level", >> "type" : { >> "type" : "map", >> "values" : { >> "type" : "record", >> "name" : "sample", >> "fields" : [ { >> "name" : "sample1", >> "type" : "string" >> }, { >> "name" : "sample2", >> "type" : "string" >> } ] >> } >> } >> } ] >> } >> } ] >> } >> >> On Tue, Aug 6, 2019 at 10:47 AM Edgar H <kaotix...@gmail.com> wrote: >> > >> > I'm trying to translate a schema that I have in Spark which is defined for >> > Parquet, and I would like to use it within Avro too. >> > >> > StructField("one_level", StructType(List(StructField( >> > "inner_level", >> > MapType( >> > StringType, >> > StructType( >> > List( >> > StructField("field1", StringType), >> > StructField("field2", ArrayType(StringType)) >> > ) >> > ) >> > ) >> > ) >> > )), nullable = false) >> > >> > However, in Avro I haven't seen any examples of Maps containing Record >> > type objects... >> > >> > Tried a sample input with an online Avro schema generator, taking this >> > input. >> > >> > { >> > "one_level": { >> > "inner_level": { >> > "sample1": { >> > "field1": "sample", >> > "field2": ["a", "b"], >> > }, >> > "sample2": { >> > "field1": "sample2", >> > "field2": ["a", "b"] >> > } >> > } >> > } >> > >> > } >> > >> > It prompts this output. >> > >> > { >> > "name": "MyClass", >> > "type": "record", >> > "namespace": "com.acme.avro", >> > "fields": [ >> > { >> > "name": "one_level", >> > "type": { >> > "name": "one_level", >> > "type": "record", >> > "fields": [ >> > { >> > "name": "inner_level", >> > "type": { >> > "name": "inner_level", >> > "type": "record", >> > "fields": [ >> > { >> > "name": "sample1", >> > "type": { >> > "name": "sample1", >> > "type": "record", >> > "fields": [ >> > { >> > "name": "field1", >> > "type": "string" >> > }, >> > { >> > "name": "field2", >> > "type": { >> > "type": "array", >> > "items": "string" >> > } >> > } >> > ] >> > } >> > }, >> > { >> > "name": "sample2", >> > "type": { >> > "name": "sample2", >> > "type": "record", >> > "fields": [ >> > { >> > "name": "field1", >> > "type": "string" >> > }, >> > { >> > "name": "field2", >> > "type": { >> > "type": "array", >> > "items": "string" >> > } >> > } >> > ] >> > } >> > } >> > ] >> > } >> > } >> > ] >> > } >> > } >> > ] >> > } >> > >> > Which isn't absolutely what I'm looking for. Is it possible to define such >> > schema in Avro?