Seems like the right time to share some Parquet vs Avro knowledge haha :)

My god, exactly what you said! Untyped List within a POJO, problem solved.
BTW, it was using ReflectData.getSchema().

Thanks a lot Ryan! Really appreciated!

El mar., 6 ago. 2019 a las 17:35, Ryan Skraba (<r...@skraba.com>) escribió:

> Funny, I'm familiar with Avro, but I'm currently looking closely at
> Parquet!
>
> Interestingly enough, I just ran across the conversion utilities in
> Spark that could have answered your original question[1].
>
> It looks like you're using ReflectData to get the schema.  Is the
> exception occurring during the ReflectData.getSchema() or .induce() ?
> Can you share the full stack trace or better yet, the POJO that
> reproduces the error?
>
> I _think_ I may have ran across something similar when getting a
> schema via reflection, but the class had a raw collection field (List
> instead of List<MyValue>).  I can't clearly recall, but that might be
> a useful hint.
>
> [1]:
> https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L136
>
> On Tue, Aug 6, 2019 at 2:39 PM Edgar H <kaotix...@gmail.com> wrote:
> >
> > Thanks a lot for the quick reply Ryan! That was exactly what I was
> looking for :)
> >
> > Been trying including the changes within my code and currently it's
> throwing the following exception... Caused by:
> org.apache.avro.AvroRuntimeException: Can't find element type of Collection
> >
> > I'm thinking that it could be the POJO not containing the classes for
> the inner record fields (I just have a getter and setter for the one_level
> field but the rest are types of that one)? Or how should it be represented
> within the parent POJO?
> >
> > Sorry if the questions sound too simple, but I'm too used to work with
> Parquet that Avro seems like a shift from time to time :)
> >
> > El mar., 6 ago. 2019 a las 12:01, Ryan Skraba (<r...@skraba.com>)
> escribió:
> >>
> >> Hello -- Avro supports a map type:
> >> https://avro.apache.org/docs/1.9.0/spec.html#Maps
> >>
> >> Generating an Avro schema from a JSON example can be ambiguous since a
> >> JSON object can either be converted to a record or a map.  You're
> >> probably looking for something like this:
> >>
> >> {
> >>   "type" : "record",
> >>   "name" : "MyClass",
> >>   "namespace" : "com.acme.avro",
> >>   "fields" : [ {
> >>     "name" : "one_level",
> >>     "type" : {
> >>       "type" : "record",
> >>       "name" : "one_level",
> >>       "fields" : [ {
> >>         "name" : "inner_level",
> >>         "type" : {
> >>           "type" : "map",
> >>           "values" : {
> >>             "type" : "record",
> >>             "name" : "sample",
> >>             "fields" : [ {
> >>               "name" : "sample1",
> >>               "type" : "string"
> >>             }, {
> >>               "name" : "sample2",
> >>               "type" : "string"
> >>             } ]
> >>           }
> >>         }
> >>       } ]
> >>     }
> >>   } ]
> >> }
> >>
> >> On Tue, Aug 6, 2019 at 10:47 AM Edgar H <kaotix...@gmail.com> wrote:
> >> >
> >> > I'm trying to translate a schema that I have in Spark which is
> defined for Parquet, and I would like to use it within Avro too.
> >> >
> >> >   StructField("one_level", StructType(List(StructField(
> >> >     "inner_level",
> >> >     MapType(
> >> >       StringType,
> >> >       StructType(
> >> >         List(
> >> >           StructField("field1", StringType),
> >> >           StructField("field2", ArrayType(StringType))
> >> >         )
> >> >       )
> >> >     )
> >> >   )
> >> > )), nullable = false)
> >> >
> >> > However, in Avro I haven't seen any examples of Maps containing
> Record type objects...
> >> >
> >> > Tried a sample input with an online Avro schema generator, taking
> this input.
> >> >
> >> > {
> >> > "one_level": {
> >> >     "inner_level": {
> >> >         "sample1": {
> >> >             "field1": "sample",
> >> >             "field2": ["a", "b"],
> >> >         },
> >> >         "sample2": {
> >> >             "field1": "sample2",
> >> >             "field2": ["a", "b"]
> >> >         }
> >> >     }
> >> > }
> >> >
> >> > }
> >> >
> >> > It prompts this output.
> >> >
> >> >     {
> >> >   "name": "MyClass",
> >> >   "type": "record",
> >> >   "namespace": "com.acme.avro",
> >> >   "fields": [
> >> >     {
> >> >       "name": "one_level",
> >> >       "type": {
> >> >         "name": "one_level",
> >> >         "type": "record",
> >> >         "fields": [
> >> >           {
> >> >             "name": "inner_level",
> >> >             "type": {
> >> >               "name": "inner_level",
> >> >               "type": "record",
> >> >               "fields": [
> >> >                 {
> >> >                   "name": "sample1",
> >> >                   "type": {
> >> >                     "name": "sample1",
> >> >                     "type": "record",
> >> >                     "fields": [
> >> >                       {
> >> >                         "name": "field1",
> >> >                         "type": "string"
> >> >                       },
> >> >                       {
> >> >                         "name": "field2",
> >> >                         "type": {
> >> >                           "type": "array",
> >> >                           "items": "string"
> >> >                         }
> >> >                       }
> >> >                     ]
> >> >                   }
> >> >                 },
> >> >                 {
> >> >                   "name": "sample2",
> >> >                   "type": {
> >> >                     "name": "sample2",
> >> >                     "type": "record",
> >> >                     "fields": [
> >> >                       {
> >> >                         "name": "field1",
> >> >                         "type": "string"
> >> >                       },
> >> >                       {
> >> >                         "name": "field2",
> >> >                         "type": {
> >> >                           "type": "array",
> >> >                           "items": "string"
> >> >                         }
> >> >                       }
> >> >                     ]
> >> >                   }
> >> >                 }
> >> >               ]
> >> >             }
> >> >           }
> >> >         ]
> >> >       }
> >> >     }
> >> >   ]
> >> > }
> >> >
> >> > Which isn't absolutely what I'm looking for. Is it possible to define
> such schema in Avro?
>

Reply via email to