Funny, I'm familiar with Avro, but I'm currently looking closely at Parquet!

Interestingly enough, I just ran across the conversion utilities in
Spark that could have answered your original question[1].

It looks like you're using ReflectData to get the schema.  Is the
exception occurring during the ReflectData.getSchema() or .induce() ?
Can you share the full stack trace or better yet, the POJO that
reproduces the error?

I _think_ I may have ran across something similar when getting a
schema via reflection, but the class had a raw collection field (List
instead of List<MyValue>).  I can't clearly recall, but that might be
a useful hint.

[1]: 
https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L136

On Tue, Aug 6, 2019 at 2:39 PM Edgar H <kaotix...@gmail.com> wrote:
>
> Thanks a lot for the quick reply Ryan! That was exactly what I was looking 
> for :)
>
> Been trying including the changes within my code and currently it's throwing 
> the following exception... Caused by: org.apache.avro.AvroRuntimeException: 
> Can't find element type of Collection
>
> I'm thinking that it could be the POJO not containing the classes for the 
> inner record fields (I just have a getter and setter for the one_level field 
> but the rest are types of that one)? Or how should it be represented within 
> the parent POJO?
>
> Sorry if the questions sound too simple, but I'm too used to work with 
> Parquet that Avro seems like a shift from time to time :)
>
> El mar., 6 ago. 2019 a las 12:01, Ryan Skraba (<r...@skraba.com>) escribió:
>>
>> Hello -- Avro supports a map type:
>> https://avro.apache.org/docs/1.9.0/spec.html#Maps
>>
>> Generating an Avro schema from a JSON example can be ambiguous since a
>> JSON object can either be converted to a record or a map.  You're
>> probably looking for something like this:
>>
>> {
>>   "type" : "record",
>>   "name" : "MyClass",
>>   "namespace" : "com.acme.avro",
>>   "fields" : [ {
>>     "name" : "one_level",
>>     "type" : {
>>       "type" : "record",
>>       "name" : "one_level",
>>       "fields" : [ {
>>         "name" : "inner_level",
>>         "type" : {
>>           "type" : "map",
>>           "values" : {
>>             "type" : "record",
>>             "name" : "sample",
>>             "fields" : [ {
>>               "name" : "sample1",
>>               "type" : "string"
>>             }, {
>>               "name" : "sample2",
>>               "type" : "string"
>>             } ]
>>           }
>>         }
>>       } ]
>>     }
>>   } ]
>> }
>>
>> On Tue, Aug 6, 2019 at 10:47 AM Edgar H <kaotix...@gmail.com> wrote:
>> >
>> > I'm trying to translate a schema that I have in Spark which is defined for 
>> > Parquet, and I would like to use it within Avro too.
>> >
>> >   StructField("one_level", StructType(List(StructField(
>> >     "inner_level",
>> >     MapType(
>> >       StringType,
>> >       StructType(
>> >         List(
>> >           StructField("field1", StringType),
>> >           StructField("field2", ArrayType(StringType))
>> >         )
>> >       )
>> >     )
>> >   )
>> > )), nullable = false)
>> >
>> > However, in Avro I haven't seen any examples of Maps containing Record 
>> > type objects...
>> >
>> > Tried a sample input with an online Avro schema generator, taking this 
>> > input.
>> >
>> > {
>> > "one_level": {
>> >     "inner_level": {
>> >         "sample1": {
>> >             "field1": "sample",
>> >             "field2": ["a", "b"],
>> >         },
>> >         "sample2": {
>> >             "field1": "sample2",
>> >             "field2": ["a", "b"]
>> >         }
>> >     }
>> > }
>> >
>> > }
>> >
>> > It prompts this output.
>> >
>> >     {
>> >   "name": "MyClass",
>> >   "type": "record",
>> >   "namespace": "com.acme.avro",
>> >   "fields": [
>> >     {
>> >       "name": "one_level",
>> >       "type": {
>> >         "name": "one_level",
>> >         "type": "record",
>> >         "fields": [
>> >           {
>> >             "name": "inner_level",
>> >             "type": {
>> >               "name": "inner_level",
>> >               "type": "record",
>> >               "fields": [
>> >                 {
>> >                   "name": "sample1",
>> >                   "type": {
>> >                     "name": "sample1",
>> >                     "type": "record",
>> >                     "fields": [
>> >                       {
>> >                         "name": "field1",
>> >                         "type": "string"
>> >                       },
>> >                       {
>> >                         "name": "field2",
>> >                         "type": {
>> >                           "type": "array",
>> >                           "items": "string"
>> >                         }
>> >                       }
>> >                     ]
>> >                   }
>> >                 },
>> >                 {
>> >                   "name": "sample2",
>> >                   "type": {
>> >                     "name": "sample2",
>> >                     "type": "record",
>> >                     "fields": [
>> >                       {
>> >                         "name": "field1",
>> >                         "type": "string"
>> >                       },
>> >                       {
>> >                         "name": "field2",
>> >                         "type": {
>> >                           "type": "array",
>> >                           "items": "string"
>> >                         }
>> >                       }
>> >                     ]
>> >                   }
>> >                 }
>> >               ]
>> >             }
>> >           }
>> >         ]
>> >       }
>> >     }
>> >   ]
>> > }
>> >
>> > Which isn't absolutely what I'm looking for. Is it possible to define such 
>> > schema in Avro?

Reply via email to