Re: JSON File, Total numbers Record: 1

Christopher Matta Wed, 13 Jan 2016 11:21:44 -0800

The apache mailing lists don't support image attachments, it would be
better to attach text anyway since it's searchable.


Chris Matta
[email protected]
215-701-3146

On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <[email protected]>
wrote:

> Hi,
>
> thanks again Jason for your support.
>
> I'm trying as u suggest, to apply the kvgen function. Below you can find
> the query and the error.
>
> *select tt.flat_intervals from (select
> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt*
>
> [image: Immagine incorporata 1]
>
>
> Could u pls suggest how to fix this drill?
>
> Best regards,
> Paolo
>
>
>
> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <[email protected]>:
>
>> This is a poor error messages that is produced when you try to flatten a
>> field that is not an array, for these fields you can just use the dot
>> notation to access their inner members (i.e.
>> flattened_array_of_maps.member_field_in_map). If you have a field where the
>> keys in a map are "unknown" or you want to do analysis on the keys, please
>> refer to the KVGEN docs [1]. I have assigned the JIRA that reported this
>> issue a while ago to myself and will work to improve the message [2].
>>
>>
>> [1] - https://drill.apache.org/docs/kvgen/
>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>>
>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <[email protected]>
>> wrote:
>>
>>> Hi  All,
>>>
>>> Jason,I used your suggests and it works, thanks a lot!
>>>
>>> As u wrote i used a subquery to have the all list of INTERVALS as I show
>>> below.
>>>
>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>>>
>>> [image: Immagine incorporata 1]
>>>
>>> The attribute "flat_intervals" is full of data that I would like to have
>>> separate them in several attributes as it is showed in the link:
>>> https://drill.apache.org/docs/flatten/
>>> I used a new subquery to have them:
>>>
>>>
>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt*
>>>
>>> but I have this error:
>>>
>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt
>>> [30027]Query execution error. Details:[
>>> SYSTEM ERROR: ClassCastException: Cannot cast
>>> org.apache.drill.exec.vector.complex.MapVector to
>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>>>
>>> Fragment 0:0
>>>
>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010]
>>> ]
>>>
>>> [image: Immagine incorporata 2]
>>>
>>> Thanks in advance!
>>>
>>> Paolo
>>>
>>>
>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <[email protected]>:
>>>
>>>> Paolo,
>>>>
>>>> Drill currently reads single JSON objects as single records. If you
>>>> look at
>>>> the top of your file you can see that the root of your document is a
>>>> single
>>>> JSON object.
>>>>
>>>> Drill accepts two formats for individual records:
>>>>
>>>> The Mongo import format, a series of JSON object one after the other in
>>>> a
>>>> file, whitespace is irrelevant, each one need not be followed by a
>>>> newline
>>>>
>>>> {"a" : 1, "b" : "hello" }
>>>> {"a": 5 : "b" : "guten tag" }
>>>>
>>>> A JSON array of objects
>>>>
>>>> [
>>>>     {"a" : 1, "b" : "hello" },
>>>>     {"a" : 2, "b", "guten tag"}
>>>> ]
>>>>
>>>> When you have a file like this you can read it using the FLATTEN
>>>> functionality of Drill to turn an array into a series of records.
>>>> https://drill.apache.org/docs/flatten/
>>>>
>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>>>>
>>>> To work with the data further, you can put the flatten call in a
>>>> subquery.
>>>> Here is how you can select the first element from each records list of
>>>> INTERVALS and select one of the nested fields inside of METRICS once the
>>>> data has been flattened.
>>>> To analyze the array, you could flatten again to get an exploded dataset
>>>> with one record per interval across all records
>>>>
>>>> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low
>>>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`)
>>>> as t;
>>>>
>>>> Here you can see that individual columns can be selected next to the
>>>> flatten call, this will copy the data into each new record:
>>>>
>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>>>>
>>>> Happy Drilling!
>>>>
>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <[email protected]
>>>> >
>>>> wrote:
>>>>
>>>> > Hi all,
>>>> >
>>>> > i'm trying to query the file that you can find in attach with drill
>>>> apache
>>>> > 1.4 . The result of this qurey is always 1 record.
>>>> >
>>>> > The query that i'm running is :
>>>> >
>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>>>> >
>>>> > If i run the similar query with the file donuts.json found on
>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
>>>> properly.
>>>> >
>>>> > SELECT t.topping FROM donuts.json t
>>>> >
>>>> > Thanks in advance.
>>>> >
>>>> > Paolo
>>>> >
>>>>
>>>
>>>
>>
>

Re: JSON File, Total numbers Record: 1

Reply via email to