Re: JSON File, Total numbers Record: 1

Jason Altekruse Tue, 12 Jan 2016 07:40:15 -0800

This is a poor error messages that is produced when you try to flatten a
field that is not an array, for these fields you can just use the dot
notation to access their inner members (i.e.
flattened_array_of_maps.member_field_in_map). If you have a field where the
keys in a map are "unknown" or you want to do analysis on the keys, please
refer to the KVGEN docs [1]. I have assigned the JIRA that reported this
issue a while ago to myself and will work to improve the message [2].



[1] - https://drill.apache.org/docs/kvgen/
[2] - https://issues.apache.org/jira/browse/DRILL-2182

On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <[email protected]>
wrote:

> Hi  All,
>
> Jason,I used your suggests and it works, thanks a lot!
>
> As u wrote i used a subquery to have the all list of INTERVALS as I show
> below.
>
> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>
> [image: Immagine incorporata 1]
>
> The attribute "flat_intervals" is full of data that I would like to have
> separate them in several attributes as it is showed in the link:
> https://drill.apache.org/docs/flatten/
> I used a new subquery to have them:
>
>
> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt*
>
> but I have this error:
>
> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt
> [30027]Query execution error. Details:[
> SYSTEM ERROR: ClassCastException: Cannot cast
> org.apache.drill.exec.vector.complex.MapVector to
> org.apache.drill.exec.vector.complex.RepeatedValueVector
>
> Fragment 0:0
>
> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010]
> ]
>
> [image: Immagine incorporata 2]
>
> Thanks in advance!
>
> Paolo
>
>
> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <[email protected]>:
>
>> Paolo,
>>
>> Drill currently reads single JSON objects as single records. If you look
>> at
>> the top of your file you can see that the root of your document is a
>> single
>> JSON object.
>>
>> Drill accepts two formats for individual records:
>>
>> The Mongo import format, a series of JSON object one after the other in a
>> file, whitespace is irrelevant, each one need not be followed by a newline
>>
>> {"a" : 1, "b" : "hello" }
>> {"a": 5 : "b" : "guten tag" }
>>
>> A JSON array of objects
>>
>> [
>>     {"a" : 1, "b" : "hello" },
>>     {"a" : 2, "b", "guten tag"}
>> ]
>>
>> When you have a file like this you can read it using the FLATTEN
>> functionality of Drill to turn an array into a series of records.
>> https://drill.apache.org/docs/flatten/
>>
>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>>
>> To work with the data further, you can put the flatten call in a subquery.
>> Here is how you can select the first element from each records list of
>> INTERVALS and select one of the nested fields inside of METRICS once the
>> data has been flattened.
>> To analyze the array, you could flatten again to get an exploded dataset
>> with one record per interval across all records
>>
>> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low
>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as
>> t;
>>
>> Here you can see that individual columns can be selected next to the
>> flatten call, this will copy the data into each new record:
>>
>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>>
>> Happy Drilling!
>>
>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <[email protected]>
>> wrote:
>>
>> > Hi all,
>> >
>> > i'm trying to query the file that you can find in attach with drill
>> apache
>> > 1.4 . The result of this qurey is always 1 record.
>> >
>> > The query that i'm running is :
>> >
>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>> >
>> > If i run the similar query with the file donuts.json found on
>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
>> properly.
>> >
>> > SELECT t.topping FROM donuts.json t
>> >
>> > Thanks in advance.
>> >
>> > Paolo
>> >
>>
>
>

Re: JSON File, Total numbers Record: 1

Reply via email to