Hi Stefan, hi Jacques, thanks for going after this - I almost resignated
but know i think it was because i accessed the data over jdbc with squirrel
and got irritated by the unknown type column there. nonetheless, if the
schema looks like this:
{
"type" : "record",
"name" : "MainRecord",
"namespace" : "drizz.WriteAvroTestFileForDrill$",
"fields" : [ {
"name" : "elements",
"type" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "NestedRecord",
"fields" : [ {
"name" : "field1",
"type" : "int"
} ]
},
"java-class" : "java.util.List"
}
} ]
}
and the contents looks like this (according to avro tojson command line
utility)
{"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
{"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
a query like
select flatten(elements) from
dfs.`/Users/j.schulte/data/avro-drill/no-union/`;
yields exactly two rows:
+---------------+
| EXPR$0 |
+---------------+
| {"field1":9} |
| {"field1":9} |
+---------------+
as if only the last element in the array would survive.
Thanks for your help so far..
On Fri, Mar 25, 2016 at 5:45 PM, Stefán Baxter <[email protected]>
wrote:
> Johannes, Jacques is right.
>
> I only tested the flattening of maps and not the flattening of
> list-of-maps.
>
> -Stefan
>
> On Fri, Mar 25, 2016 at 4:12 PM, Jacques Nadeau <[email protected]>
> wrote:
>
> > I think there is some incorrect information and confusion in this thread.
> > Could you please share a piece of sample data and a specific query? The
> > error message shown in your original email is suggesting that you were
> > trying to flatten a map rather than an array of maps. Flatten is for
> arrays
> > only. The arrays can have scalars or complex objects in them.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Mar 25, 2016 at 2:00 AM, Johannes Schulte <
> > [email protected]> wrote:
> >
> > > Hi Stefan,
> > >
> > > thanks for this information - so it seems that there is currently no
> way
> > of
> > > accessing nested rich objects with drill; I somehow got that wrong from
> > the
> > > documentation...
> > >
> > > Cheers,
> > > Johannes
> > >
> > > On Thu, Mar 24, 2016 at 2:14 PM, Stefán Baxter <
> > [email protected]>
> > > wrote:
> > >
> > > > FYI: flattening of embedded structures is not supported in Parquet
> > > either.
> > > >
> > > > Regards,
> > > > -Stefan
> > > >
> > > > On Wed, Mar 23, 2016 at 8:51 PM, Johannes Schulte <
> > > > [email protected]> wrote:
> > > >
> > > > > Hi Stefan,
> > > > >
> > > > > thanks for your response and the link to your udf repository, it's
> a
> > > good
> > > > > reference. I tried drill 1.6, the data is an array of complex
> objects
> > > > > though. I will try to setup a drill dev environment and see if i
> can
> > > > modify
> > > > > the tests to fail.
> > > > >
> > > > > Johannes
> > > > >
> > > > > On Wed, Mar 23, 2016 at 8:13 PM, Stefán Baxter <
> > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > FYI. this seems to be working in 1.6, at least on the Avro data
> > that
> > > we
> > > > > > have.
> > > > > >
> > > > > > On Wed, Mar 23, 2016 at 6:59 PM, Stefán Baxter <
> > > > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi again,
> > > > > > >
> > > > > > > What version of Drill are you using?
> > > > > > >
> > > > > > > Regards,
> > > > > > > - Stefán
> > > > > > >
> > > > > > > On Wed, Mar 23, 2016 at 4:49 PM, Stefán Baxter <
> > > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Johannes,
> > > > > > >>
> > > > > > >> As great as Drill is the Avro plugin has been a source of
> > > > frustration
> > > > > > for
> > > > > > >> us @activitystream.
> > > > > > >>
> > > > > > >> We have a small UDF library [1] (apache licensed) which
> > contains a
> > > > > > >> function can return an array (List<String>) from Avro as a CSV
> > > list.
> > > > > > >>
> > > > > > >> You could use that to roll your own or provide me with a small
> > > > sample
> > > > > > and
> > > > > > >> I can create a custom flatten function for you.
> > > > > > >>
> > > > > > >> The best would be to wait for a fix but this can potentially
> get
> > > you
> > > > > out
> > > > > > >> of a rough spot.
> > > > > > >>
> > > > > > >> [1] https://github.com/activitystream/asdrill
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> -Stefán
> > > > > > >>
> > > > > > >> On Wed, Mar 23, 2016 at 9:05 AM, Johannes Schulte <
> > > > > > >> [email protected]> wrote:
> > > > > > >>
> > > > > > >>> Hi,
> > > > > > >>>
> > > > > > >>> when trying to read simple avro arrays with select
> > flatten(array)
> > > > > from
> > > > > > >>> dfs... i get the exception
> > > > > > >>>
> > > > > > >>> SQL Query Error: SYSTEM ERROR: ClassCastException: Cannot
> cast
> > > > > > >>> org.apache.drill.exec.vector.complex.MapVector to
> > > > > > >>> org.apache.drill.exec.vector.complex.RepeatedValueVector
> > > > > > >>> ^
> > > > > > >>>
> > > > > > >>> The type of the array is said to be <UnknownType (2,002)>
> > > > > > >>>
> > > > > > >>> Is this the expected behaviour? The documentation mostly
> talsk
> > > > about
> > > > > > json
> > > > > > >>> and parquet complex types and i wonder if the avro storage
> > plugin
> > > > > > behaves
> > > > > > >>> differently.
> > > > > > >>>
> > > > > > >>> Thanks,
> > > > > > >>>
> > > > > > >>> Johannes
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>