The apache mailing lists don't support image attachments, it would be better to attach text anyway since it's searchable.
Chris Matta cma...@mapr.com 215-701-3146 On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <paolosp...@gmail.com> wrote: > Hi, > > thanks again Jason for your support. > > I'm trying as u suggest, to apply the kvgen function. Below you can find > the query and the error. > > *select tt.flat_intervals from (select > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt* > > [image: Immagine incorporata 1] > > > Could u pls suggest how to fix this drill? > > Best regards, > Paolo > > > > 2016-01-12 16:31 GMT+01:00 Jason Altekruse <altekruseja...@gmail.com>: > >> This is a poor error messages that is produced when you try to flatten a >> field that is not an array, for these fields you can just use the dot >> notation to access their inner members (i.e. >> flattened_array_of_maps.member_field_in_map). If you have a field where the >> keys in a map are "unknown" or you want to do analysis on the keys, please >> refer to the KVGEN docs [1]. I have assigned the JIRA that reported this >> issue a while ago to myself and will work to improve the message [2]. >> >> >> [1] - https://drill.apache.org/docs/kvgen/ >> [2] - https://issues.apache.org/jira/browse/DRILL-2182 >> >> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <paolosp...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> Jason,I used your suggests and it works, thanks a lot! >>> >>> As u wrote i used a subquery to have the all list of INTERVALS as I show >>> below. >>> >>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as >>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select >>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t* >>> >>> [image: Immagine incorporata 1] >>> >>> The attribute "flat_intervals" is full of data that I would like to have >>> separate them in several attributes as it is showed in the link: >>> https://drill.apache.org/docs/flatten/ >>> I used a new subquery to have them: >>> >>> >>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from >>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as >>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select >>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt* >>> >>> but I have this error: >>> >>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: >>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from >>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as >>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select >>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt >>> [30027]Query execution error. Details:[ >>> SYSTEM ERROR: ClassCastException: Cannot cast >>> org.apache.drill.exec.vector.complex.MapVector to >>> org.apache.drill.exec.vector.complex.RepeatedValueVector >>> >>> Fragment 0:0 >>> >>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010] >>> ] >>> >>> [image: Immagine incorporata 2] >>> >>> Thanks in advance! >>> >>> Paolo >>> >>> >>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <altekruseja...@gmail.com>: >>> >>>> Paolo, >>>> >>>> Drill currently reads single JSON objects as single records. If you >>>> look at >>>> the top of your file you can see that the root of your document is a >>>> single >>>> JSON object. >>>> >>>> Drill accepts two formats for individual records: >>>> >>>> The Mongo import format, a series of JSON object one after the other in >>>> a >>>> file, whitespace is irrelevant, each one need not be followed by a >>>> newline >>>> >>>> {"a" : 1, "b" : "hello" } >>>> {"a": 5 : "b" : "guten tag" } >>>> >>>> A JSON array of objects >>>> >>>> [ >>>> {"a" : 1, "b" : "hello" }, >>>> {"a" : 2, "b", "guten tag"} >>>> ] >>>> >>>> When you have a file like this you can read it using the FLATTEN >>>> functionality of Drill to turn an array into a series of records. >>>> https://drill.apache.org/docs/flatten/ >>>> >>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`; >>>> >>>> To work with the data further, you can put the flatten call in a >>>> subquery. >>>> Here is how you can select the first element from each records list of >>>> INTERVALS and select one of the nested fields inside of METRICS once the >>>> data has been flattened. >>>> To analyze the array, you could flatten again to get an exploded dataset >>>> with one record per interval across all records >>>> >>>> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low >>>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) >>>> as t; >>>> >>>> Here you can see that individual columns can be selected next to the >>>> flatten call, this will copy the data into each new record: >>>> >>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals, >>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as >>>> flat_rides from dfs.tmp.`rideDB.json`) as t; >>>> >>>> Happy Drilling! >>>> >>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <paolosp...@gmail.com >>>> > >>>> wrote: >>>> >>>> > Hi all, >>>> > >>>> > i'm trying to query the file that you can find in attach with drill >>>> apache >>>> > 1.4 . The result of this qurey is always 1 record. >>>> > >>>> > The query that i'm running is : >>>> > >>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t >>>> > >>>> > If i run the similar query with the file donuts.json found on >>>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs >>>> properly. >>>> > >>>> > SELECT t.topping FROM donuts.json t >>>> > >>>> > Thanks in advance. >>>> > >>>> > Paolo >>>> > >>>> >>> >>> >> >