Hi, Should this be posted on the dev list to get the deserved attention?
- Stefán On Tue, Apr 12, 2016 at 9:33 PM, Johannes Schulte < [email protected]> wrote: > After some evenings of digging into the code i more or less had a lucky > moment and was able to fix the problem. I wonder why nobody else ran into > this problem until now - for me it was a blocker to drill adoption and i am > really surprised nobody else ever encountered this issue. I hope that > somebody with more knowledge of the codebase can review this and integrate > it soon. > > > On Sun, Apr 3, 2016 at 11:29 AM, Johannes Schulte < > [email protected]> wrote: > > > Alright, thanks! I created a pull request and are very open for any input > > > > https://github.com/apache/drill/pull/459 > > > > Cheers, > > > > Johannes > > > > On Sun, Apr 3, 2016 at 9:10 AM, Abdel Hakim Deneche < > [email protected] > > > wrote: > > > >> pull requests are fine. You still need a JIRA though > >> > >> On Sun, Apr 3, 2016 at 8:03 AM, Johannes Schulte < > >> [email protected] > >> > wrote: > >> > >> > I now extended the AvroFormatTest-Suite by two unit tests that show > that > >> > > >> > * Flattening of primitive array works as expected > >> > * Flattening of arrays of records does not work properly > >> > > >> > I spent some time trying to find the reason but it's my first contact > >> with > >> > the drill-codebase. > >> > > >> > Is the recommended way of making this unit test available still to > >> attach a > >> > patch in an issue or is a pull-request also an option? > >> > > >> > In the context of the recent avro maturity discussion I would love to > >> fix > >> > this error myself but I would need some hints what goes wrong there > >> > internally. > >> > > >> > Johannes > >> > > >> > On Fri, Mar 25, 2016 at 10:50 PM, Johannes Schulte < > >> > [email protected]> wrote: > >> > > >> > > Hi Stefan, hi Jacques, thanks for going after this - I almost > >> resignated > >> > > but know i think it was because i accessed the data over jdbc with > >> > squirrel > >> > > and got irritated by the unknown type column there. nonetheless, if > >> the > >> > > schema looks like this: > >> > > > >> > > > >> > > { > >> > > "type" : "record", > >> > > "name" : "MainRecord", > >> > > "namespace" : "drizz.WriteAvroTestFileForDrill$", > >> > > "fields" : [ { > >> > > "name" : "elements", > >> > > "type" : { > >> > > "type" : "array", > >> > > "items" : { > >> > > "type" : "record", > >> > > "name" : "NestedRecord", > >> > > "fields" : [ { > >> > > "name" : "field1", > >> > > "type" : "int" > >> > > } ] > >> > > }, > >> > > "java-class" : "java.util.List" > >> > > } > >> > > } ] > >> > > } > >> > > > >> > > and the contents looks like this (according to avro tojson command > >> line > >> > > utility) > >> > > > >> > > > >> > > > >> > > >> > {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]} > >> > > > >> > > > >> > > >> > {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]} > >> > > > >> > > a query like > >> > > > >> > > select flatten(elements) from > >> > > dfs.`/Users/j.schulte/data/avro-drill/no-union/`; > >> > > > >> > > yields exactly two rows: > >> > > +---------------+ > >> > > | EXPR$0 | > >> > > +---------------+ > >> > > | {"field1":9} | > >> > > | {"field1":9} | > >> > > +---------------+ > >> > > > >> > > as if only the last element in the array would survive. > >> > > > >> > > Thanks for your help so far.. > >> > > > >> > > On Fri, Mar 25, 2016 at 5:45 PM, Stefán Baxter < > >> > [email protected]> > >> > > wrote: > >> > > > >> > >> Johannes, Jacques is right. > >> > >> > >> > >> I only tested the flattening of maps and not the flattening of > >> > >> list-of-maps. > >> > >> > >> > >> -Stefan > >> > >> > >> > >> On Fri, Mar 25, 2016 at 4:12 PM, Jacques Nadeau < > [email protected]> > >> > >> wrote: > >> > >> > >> > >> > I think there is some incorrect information and confusion in this > >> > >> thread. > >> > >> > Could you please share a piece of sample data and a specific > query? > >> > The > >> > >> > error message shown in your original email is suggesting that you > >> were > >> > >> > trying to flatten a map rather than an array of maps. Flatten is > >> for > >> > >> arrays > >> > >> > only. The arrays can have scalars or complex objects in them. > >> > >> > > >> > >> > -- > >> > >> > Jacques Nadeau > >> > >> > CTO and Co-Founder, Dremio > >> > >> > > >> > >> > On Fri, Mar 25, 2016 at 2:00 AM, Johannes Schulte < > >> > >> > [email protected]> wrote: > >> > >> > > >> > >> > > Hi Stefan, > >> > >> > > > >> > >> > > thanks for this information - so it seems that there is > >> currently no > >> > >> way > >> > >> > of > >> > >> > > accessing nested rich objects with drill; I somehow got that > >> wrong > >> > >> from > >> > >> > the > >> > >> > > documentation... > >> > >> > > > >> > >> > > Cheers, > >> > >> > > Johannes > >> > >> > > > >> > >> > > On Thu, Mar 24, 2016 at 2:14 PM, Stefán Baxter < > >> > >> > [email protected]> > >> > >> > > wrote: > >> > >> > > > >> > >> > > > FYI: flattening of embedded structures is not supported in > >> Parquet > >> > >> > > either. > >> > >> > > > > >> > >> > > > Regards, > >> > >> > > > -Stefan > >> > >> > > > > >> > >> > > > On Wed, Mar 23, 2016 at 8:51 PM, Johannes Schulte < > >> > >> > > > [email protected]> wrote: > >> > >> > > > > >> > >> > > > > Hi Stefan, > >> > >> > > > > > >> > >> > > > > thanks for your response and the link to your udf > repository, > >> > >> it's a > >> > >> > > good > >> > >> > > > > reference. I tried drill 1.6, the data is an array of > complex > >> > >> objects > >> > >> > > > > though. I will try to setup a drill dev environment and see > >> if i > >> > >> can > >> > >> > > > modify > >> > >> > > > > the tests to fail. > >> > >> > > > > > >> > >> > > > > Johannes > >> > >> > > > > > >> > >> > > > > On Wed, Mar 23, 2016 at 8:13 PM, Stefán Baxter < > >> > >> > > > [email protected]> > >> > >> > > > > wrote: > >> > >> > > > > > >> > >> > > > > > FYI. this seems to be working in 1.6, at least on the > Avro > >> > data > >> > >> > that > >> > >> > > we > >> > >> > > > > > have. > >> > >> > > > > > > >> > >> > > > > > On Wed, Mar 23, 2016 at 6:59 PM, Stefán Baxter < > >> > >> > > > > [email protected]> > >> > >> > > > > > wrote: > >> > >> > > > > > > >> > >> > > > > > > Hi again, > >> > >> > > > > > > > >> > >> > > > > > > What version of Drill are you using? > >> > >> > > > > > > > >> > >> > > > > > > Regards, > >> > >> > > > > > > - Stefán > >> > >> > > > > > > > >> > >> > > > > > > On Wed, Mar 23, 2016 at 4:49 PM, Stefán Baxter < > >> > >> > > > > > [email protected]> > >> > >> > > > > > > wrote: > >> > >> > > > > > > > >> > >> > > > > > >> Hi Johannes, > >> > >> > > > > > >> > >> > >> > > > > > >> As great as Drill is the Avro plugin has been a source > >> of > >> > >> > > > frustration > >> > >> > > > > > for > >> > >> > > > > > >> us @activitystream. > >> > >> > > > > > >> > >> > >> > > > > > >> We have a small UDF library [1] (apache licensed) > which > >> > >> > contains a > >> > >> > > > > > >> function can return an array (List<String>) from Avro > >> as a > >> > >> CSV > >> > >> > > list. > >> > >> > > > > > >> > >> > >> > > > > > >> You could use that to roll your own or provide me > with a > >> > >> small > >> > >> > > > sample > >> > >> > > > > > and > >> > >> > > > > > >> I can create a custom flatten function for you. > >> > >> > > > > > >> > >> > >> > > > > > >> The best would be to wait for a fix but this can > >> > potentially > >> > >> get > >> > >> > > you > >> > >> > > > > out > >> > >> > > > > > >> of a rough spot. > >> > >> > > > > > >> > >> > >> > > > > > >> [1] https://github.com/activitystream/asdrill > >> > >> > > > > > >> > >> > >> > > > > > >> Regards, > >> > >> > > > > > >> -Stefán > >> > >> > > > > > >> > >> > >> > > > > > >> On Wed, Mar 23, 2016 at 9:05 AM, Johannes Schulte < > >> > >> > > > > > >> [email protected]> wrote: > >> > >> > > > > > >> > >> > >> > > > > > >>> Hi, > >> > >> > > > > > >>> > >> > >> > > > > > >>> when trying to read simple avro arrays with select > >> > >> > flatten(array) > >> > >> > > > > from > >> > >> > > > > > >>> dfs... i get the exception > >> > >> > > > > > >>> > >> > >> > > > > > >>> SQL Query Error: SYSTEM ERROR: ClassCastException: > >> Cannot > >> > >> cast > >> > >> > > > > > >>> org.apache.drill.exec.vector.complex.MapVector to > >> > >> > > > > > >>> > >> org.apache.drill.exec.vector.complex.RepeatedValueVector > >> > >> > > > > > >>> ^ > >> > >> > > > > > >>> > >> > >> > > > > > >>> The type of the array is said to be <UnknownType > >> (2,002)> > >> > >> > > > > > >>> > >> > >> > > > > > >>> Is this the expected behaviour? The documentation > >> mostly > >> > >> talsk > >> > >> > > > about > >> > >> > > > > > json > >> > >> > > > > > >>> and parquet complex types and i wonder if the avro > >> storage > >> > >> > plugin > >> > >> > > > > > behaves > >> > >> > > > > > >>> differently. > >> > >> > > > > > >>> > >> > >> > > > > > >>> Thanks, > >> > >> > > > > > >>> > >> > >> > > > > > >>> Johannes > >> > >> > > > > > >>> > >> > >> > > > > > >> > >> > >> > > > > > >> > >> > >> > > > > > > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > > > >> > > > >> > > >> > >> > >> > >> -- > >> > >> Abdelhakim Deneche > >> > >> Software Engineer > >> > >> <http://www.mapr.com/> > >> > >> > >> Now Available - Free Hadoop On-Demand Training > >> < > >> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > >> > > >> > > > > >
