Hi,

Should this be posted on the dev list to get the deserved attention?

- Stefán



On Tue, Apr 12, 2016 at 9:33 PM, Johannes Schulte <
[email protected]> wrote:

> After some evenings of digging into the code i more or less had a lucky
> moment and was able to fix the problem. I wonder why nobody else ran into
> this problem until now - for me it was a blocker to drill adoption and i am
> really surprised nobody else ever encountered this issue. I hope that
> somebody with more knowledge of the codebase can review this and integrate
> it soon.
>
>
> On Sun, Apr 3, 2016 at 11:29 AM, Johannes Schulte <
> [email protected]> wrote:
>
> > Alright, thanks! I created a pull request and are very open for any input
> >
> > https://github.com/apache/drill/pull/459
> >
> > Cheers,
> >
> > Johannes
> >
> > On Sun, Apr 3, 2016 at 9:10 AM, Abdel Hakim Deneche <
> [email protected]
> > > wrote:
> >
> >> pull requests are fine. You still need a JIRA though
> >>
> >> On Sun, Apr 3, 2016 at 8:03 AM, Johannes Schulte <
> >> [email protected]
> >> > wrote:
> >>
> >> > I now extended the AvroFormatTest-Suite by two unit tests that show
> that
> >> >
> >> > * Flattening of primitive array works as expected
> >> > * Flattening of arrays of records does not work properly
> >> >
> >> > I spent some time trying to find the reason but it's my first contact
> >> with
> >> > the drill-codebase.
> >> >
> >> > Is the recommended way of making this unit test available still to
> >> attach a
> >> > patch in an issue or is a pull-request also an option?
> >> >
> >> > In the context of the recent avro maturity discussion I would love to
> >> fix
> >> > this error myself but I would need some hints what goes wrong there
> >> > internally.
> >> >
> >> > Johannes
> >> >
> >> > On Fri, Mar 25, 2016 at 10:50 PM, Johannes Schulte <
> >> > [email protected]> wrote:
> >> >
> >> > > Hi Stefan, hi Jacques, thanks for going after this - I almost
> >> resignated
> >> > > but know i think it was because i accessed the data over jdbc with
> >> > squirrel
> >> > > and got irritated by the unknown type column there. nonetheless, if
> >> the
> >> > > schema looks like this:
> >> > >
> >> > >
> >> > > {
> >> > >   "type" : "record",
> >> > >   "name" : "MainRecord",
> >> > >   "namespace" : "drizz.WriteAvroTestFileForDrill$",
> >> > >   "fields" : [ {
> >> > >     "name" : "elements",
> >> > >     "type" : {
> >> > >       "type" : "array",
> >> > >       "items" : {
> >> > >         "type" : "record",
> >> > >         "name" : "NestedRecord",
> >> > >         "fields" : [ {
> >> > >           "name" : "field1",
> >> > >           "type" : "int"
> >> > >         } ]
> >> > >       },
> >> > >       "java-class" : "java.util.List"
> >> > >     }
> >> > >   } ]
> >> > > }
> >> > >
> >> > > and the contents looks like this (according to avro tojson command
> >> line
> >> > > utility)
> >> > >
> >> > >
> >> > >
> >> >
> >>
> {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
> >> > >
> >> > >
> >> >
> >>
> {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
> >> > >
> >> > > a query like
> >> > >
> >> > > select flatten(elements) from
> >> > > dfs.`/Users/j.schulte/data/avro-drill/no-union/`;
> >> > >
> >> > > yields exactly two rows:
> >> > > +---------------+
> >> > > |    EXPR$0     |
> >> > > +---------------+
> >> > > | {"field1":9}  |
> >> > > | {"field1":9}  |
> >> > > +---------------+
> >> > >
> >> > > as if only the last element in the array would survive.
> >> > >
> >> > > Thanks for your help so far..
> >> > >
> >> > > On Fri, Mar 25, 2016 at 5:45 PM, Stefán Baxter <
> >> > [email protected]>
> >> > > wrote:
> >> > >
> >> > >> Johannes, Jacques is right.
> >> > >>
> >> > >> I only tested the flattening of maps and not the flattening of
> >> > >> list-of-maps.
> >> > >>
> >> > >> -Stefan
> >> > >>
> >> > >> On Fri, Mar 25, 2016 at 4:12 PM, Jacques Nadeau <
> [email protected]>
> >> > >> wrote:
> >> > >>
> >> > >> > I think there is some incorrect information and confusion in this
> >> > >> thread.
> >> > >> > Could you please share a piece of sample data and a specific
> query?
> >> > The
> >> > >> > error message shown in your original email is suggesting that you
> >> were
> >> > >> > trying to flatten a map rather than an array of maps. Flatten is
> >> for
> >> > >> arrays
> >> > >> > only. The arrays can have scalars or complex objects in them.
> >> > >> >
> >> > >> > --
> >> > >> > Jacques Nadeau
> >> > >> > CTO and Co-Founder, Dremio
> >> > >> >
> >> > >> > On Fri, Mar 25, 2016 at 2:00 AM, Johannes Schulte <
> >> > >> > [email protected]> wrote:
> >> > >> >
> >> > >> > > Hi Stefan,
> >> > >> > >
> >> > >> > > thanks for this information - so it seems that there is
> >> currently no
> >> > >> way
> >> > >> > of
> >> > >> > > accessing nested rich objects with drill; I somehow got that
> >> wrong
> >> > >> from
> >> > >> > the
> >> > >> > > documentation...
> >> > >> > >
> >> > >> > > Cheers,
> >> > >> > > Johannes
> >> > >> > >
> >> > >> > > On Thu, Mar 24, 2016 at 2:14 PM, Stefán Baxter <
> >> > >> > [email protected]>
> >> > >> > > wrote:
> >> > >> > >
> >> > >> > > > FYI: flattening of embedded structures is not supported in
> >> Parquet
> >> > >> > > either.
> >> > >> > > >
> >> > >> > > > Regards,
> >> > >> > > >  -Stefan
> >> > >> > > >
> >> > >> > > > On Wed, Mar 23, 2016 at 8:51 PM, Johannes Schulte <
> >> > >> > > > [email protected]> wrote:
> >> > >> > > >
> >> > >> > > > > Hi Stefan,
> >> > >> > > > >
> >> > >> > > > > thanks for your response and the link to your udf
> repository,
> >> > >> it's a
> >> > >> > > good
> >> > >> > > > > reference. I tried drill 1.6, the data is an array of
> complex
> >> > >> objects
> >> > >> > > > > though. I will try to setup a drill dev environment and see
> >> if i
> >> > >> can
> >> > >> > > > modify
> >> > >> > > > > the tests to fail.
> >> > >> > > > >
> >> > >> > > > > Johannes
> >> > >> > > > >
> >> > >> > > > > On Wed, Mar 23, 2016 at 8:13 PM, Stefán Baxter <
> >> > >> > > > [email protected]>
> >> > >> > > > > wrote:
> >> > >> > > > >
> >> > >> > > > > > FYI. this seems to be working in 1.6, at least on the
> Avro
> >> > data
> >> > >> > that
> >> > >> > > we
> >> > >> > > > > > have.
> >> > >> > > > > >
> >> > >> > > > > > On Wed, Mar 23, 2016 at 6:59 PM, Stefán Baxter <
> >> > >> > > > > [email protected]>
> >> > >> > > > > > wrote:
> >> > >> > > > > >
> >> > >> > > > > > > Hi again,
> >> > >> > > > > > >
> >> > >> > > > > > > What version of Drill are you using?
> >> > >> > > > > > >
> >> > >> > > > > > > Regards,
> >> > >> > > > > > > - Stefán
> >> > >> > > > > > >
> >> > >> > > > > > > On Wed, Mar 23, 2016 at 4:49 PM, Stefán Baxter <
> >> > >> > > > > > [email protected]>
> >> > >> > > > > > > wrote:
> >> > >> > > > > > >
> >> > >> > > > > > >> Hi Johannes,
> >> > >> > > > > > >>
> >> > >> > > > > > >> As great as Drill is the Avro plugin has been a source
> >> of
> >> > >> > > > frustration
> >> > >> > > > > > for
> >> > >> > > > > > >> us @activitystream.
> >> > >> > > > > > >>
> >> > >> > > > > > >> We have a small UDF library [1] (apache licensed)
> which
> >> > >> > contains a
> >> > >> > > > > > >> function can return an array (List<String>) from Avro
> >> as a
> >> > >> CSV
> >> > >> > > list.
> >> > >> > > > > > >>
> >> > >> > > > > > >> You could use that to roll your own or provide me
> with a
> >> > >> small
> >> > >> > > > sample
> >> > >> > > > > > and
> >> > >> > > > > > >> I can create a custom flatten function for you.
> >> > >> > > > > > >>
> >> > >> > > > > > >> The best would be to wait for a fix but this can
> >> > potentially
> >> > >> get
> >> > >> > > you
> >> > >> > > > > out
> >> > >> > > > > > >> of a rough spot.
> >> > >> > > > > > >>
> >> > >> > > > > > >> [1] https://github.com/activitystream/asdrill
> >> > >> > > > > > >>
> >> > >> > > > > > >> Regards,
> >> > >> > > > > > >>  -Stefán
> >> > >> > > > > > >>
> >> > >> > > > > > >> On Wed, Mar 23, 2016 at 9:05 AM, Johannes Schulte <
> >> > >> > > > > > >> [email protected]> wrote:
> >> > >> > > > > > >>
> >> > >> > > > > > >>> Hi,
> >> > >> > > > > > >>>
> >> > >> > > > > > >>> when trying to read simple avro arrays with select
> >> > >> > flatten(array)
> >> > >> > > > > from
> >> > >> > > > > > >>> dfs... i get the exception
> >> > >> > > > > > >>>
> >> > >> > > > > > >>> SQL Query Error: SYSTEM ERROR: ClassCastException:
> >> Cannot
> >> > >> cast
> >> > >> > > > > > >>> org.apache.drill.exec.vector.complex.MapVector to
> >> > >> > > > > > >>>
> >> org.apache.drill.exec.vector.complex.RepeatedValueVector
> >> > >> > > > > > >>> ^
> >> > >> > > > > > >>>
> >> > >> > > > > > >>> The type of the array is said to be <UnknownType
> >> (2,002)>
> >> > >> > > > > > >>>
> >> > >> > > > > > >>> Is this the expected behaviour? The documentation
> >> mostly
> >> > >> talsk
> >> > >> > > > about
> >> > >> > > > > > json
> >> > >> > > > > > >>> and parquet complex types and i wonder if the avro
> >> storage
> >> > >> > plugin
> >> > >> > > > > > behaves
> >> > >> > > > > > >>> differently.
> >> > >> > > > > > >>>
> >> > >> > > > > > >>> Thanks,
> >> > >> > > > > > >>>
> >> > >> > > > > > >>> Johannes
> >> > >> > > > > > >>>
> >> > >> > > > > > >>
> >> > >> > > > > > >>
> >> > >> > > > > > >
> >> > >> > > > > >
> >> > >> > > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> Abdelhakim Deneche
> >>
> >> Software Engineer
> >>
> >>   <http://www.mapr.com/>
> >>
> >>
> >> Now Available - Free Hadoop On-Demand Training
> >> <
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >
> >>
> >
> >
>

Reply via email to