I now extended the AvroFormatTest-Suite by two unit tests that show that

* Flattening of primitive array works as expected
* Flattening of arrays of records does not work properly

I spent some time trying to find the reason but it's my first contact with
the drill-codebase.

Is the recommended way of making this unit test available still to attach a
patch in an issue or is a pull-request also an option?

In the context of the recent avro maturity discussion I would love to fix
this error myself but I would need some hints what goes wrong there
internally.

Johannes

On Fri, Mar 25, 2016 at 10:50 PM, Johannes Schulte <
[email protected]> wrote:

> Hi Stefan, hi Jacques, thanks for going after this - I almost resignated
> but know i think it was because i accessed the data over jdbc with squirrel
> and got irritated by the unknown type column there. nonetheless, if the
> schema looks like this:
>
>
> {
>   "type" : "record",
>   "name" : "MainRecord",
>   "namespace" : "drizz.WriteAvroTestFileForDrill$",
>   "fields" : [ {
>     "name" : "elements",
>     "type" : {
>       "type" : "array",
>       "items" : {
>         "type" : "record",
>         "name" : "NestedRecord",
>         "fields" : [ {
>           "name" : "field1",
>           "type" : "int"
>         } ]
>       },
>       "java-class" : "java.util.List"
>     }
>   } ]
> }
>
> and the contents looks like this (according to avro tojson command line
> utility)
>
>
> {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
>
> {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
>
> a query like
>
> select flatten(elements) from
> dfs.`/Users/j.schulte/data/avro-drill/no-union/`;
>
> yields exactly two rows:
> +---------------+
> |    EXPR$0     |
> +---------------+
> | {"field1":9}  |
> | {"field1":9}  |
> +---------------+
>
> as if only the last element in the array would survive.
>
> Thanks for your help so far..
>
> On Fri, Mar 25, 2016 at 5:45 PM, Stefán Baxter <[email protected]>
> wrote:
>
>> Johannes, Jacques is right.
>>
>> I only tested the flattening of maps and not the flattening of
>> list-of-maps.
>>
>> -Stefan
>>
>> On Fri, Mar 25, 2016 at 4:12 PM, Jacques Nadeau <[email protected]>
>> wrote:
>>
>> > I think there is some incorrect information and confusion in this
>> thread.
>> > Could you please share a piece of sample data and a specific query? The
>> > error message shown in your original email is suggesting that you were
>> > trying to flatten a map rather than an array of maps. Flatten is for
>> arrays
>> > only. The arrays can have scalars or complex objects in them.
>> >
>> > --
>> > Jacques Nadeau
>> > CTO and Co-Founder, Dremio
>> >
>> > On Fri, Mar 25, 2016 at 2:00 AM, Johannes Schulte <
>> > [email protected]> wrote:
>> >
>> > > Hi Stefan,
>> > >
>> > > thanks for this information - so it seems that there is currently no
>> way
>> > of
>> > > accessing nested rich objects with drill; I somehow got that wrong
>> from
>> > the
>> > > documentation...
>> > >
>> > > Cheers,
>> > > Johannes
>> > >
>> > > On Thu, Mar 24, 2016 at 2:14 PM, Stefán Baxter <
>> > [email protected]>
>> > > wrote:
>> > >
>> > > > FYI: flattening of embedded structures is not supported in Parquet
>> > > either.
>> > > >
>> > > > Regards,
>> > > >  -Stefan
>> > > >
>> > > > On Wed, Mar 23, 2016 at 8:51 PM, Johannes Schulte <
>> > > > [email protected]> wrote:
>> > > >
>> > > > > Hi Stefan,
>> > > > >
>> > > > > thanks for your response and the link to your udf repository,
>> it's a
>> > > good
>> > > > > reference. I tried drill 1.6, the data is an array of complex
>> objects
>> > > > > though. I will try to setup a drill dev environment and see if i
>> can
>> > > > modify
>> > > > > the tests to fail.
>> > > > >
>> > > > > Johannes
>> > > > >
>> > > > > On Wed, Mar 23, 2016 at 8:13 PM, Stefán Baxter <
>> > > > [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > FYI. this seems to be working in 1.6, at least on the Avro data
>> > that
>> > > we
>> > > > > > have.
>> > > > > >
>> > > > > > On Wed, Mar 23, 2016 at 6:59 PM, Stefán Baxter <
>> > > > > [email protected]>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi again,
>> > > > > > >
>> > > > > > > What version of Drill are you using?
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > > - Stefán
>> > > > > > >
>> > > > > > > On Wed, Mar 23, 2016 at 4:49 PM, Stefán Baxter <
>> > > > > > [email protected]>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > >> Hi Johannes,
>> > > > > > >>
>> > > > > > >> As great as Drill is the Avro plugin has been a source of
>> > > > frustration
>> > > > > > for
>> > > > > > >> us @activitystream.
>> > > > > > >>
>> > > > > > >> We have a small UDF library [1] (apache licensed) which
>> > contains a
>> > > > > > >> function can return an array (List<String>) from Avro as a
>> CSV
>> > > list.
>> > > > > > >>
>> > > > > > >> You could use that to roll your own or provide me with a
>> small
>> > > > sample
>> > > > > > and
>> > > > > > >> I can create a custom flatten function for you.
>> > > > > > >>
>> > > > > > >> The best would be to wait for a fix but this can potentially
>> get
>> > > you
>> > > > > out
>> > > > > > >> of a rough spot.
>> > > > > > >>
>> > > > > > >> [1] https://github.com/activitystream/asdrill
>> > > > > > >>
>> > > > > > >> Regards,
>> > > > > > >>  -Stefán
>> > > > > > >>
>> > > > > > >> On Wed, Mar 23, 2016 at 9:05 AM, Johannes Schulte <
>> > > > > > >> [email protected]> wrote:
>> > > > > > >>
>> > > > > > >>> Hi,
>> > > > > > >>>
>> > > > > > >>> when trying to read simple avro arrays with select
>> > flatten(array)
>> > > > > from
>> > > > > > >>> dfs... i get the exception
>> > > > > > >>>
>> > > > > > >>> SQL Query Error: SYSTEM ERROR: ClassCastException: Cannot
>> cast
>> > > > > > >>> org.apache.drill.exec.vector.complex.MapVector to
>> > > > > > >>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>> > > > > > >>> ^
>> > > > > > >>>
>> > > > > > >>> The type of the array is said to be <UnknownType (2,002)>
>> > > > > > >>>
>> > > > > > >>> Is this the expected behaviour? The documentation mostly
>> talsk
>> > > > about
>> > > > > > json
>> > > > > > >>> and parquet complex types and i wonder if the avro storage
>> > plugin
>> > > > > > behaves
>> > > > > > >>> differently.
>> > > > > > >>>
>> > > > > > >>> Thanks,
>> > > > > > >>>
>> > > > > > >>> Johannes
>> > > > > > >>>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to