in addition to this.

selecting: select some, t.others, t.others.additional from dfs.tmp.`/test.json`
as t;
- returns this: "yes", {"additional":"last entries only"}, "last entries
only"

finding the previously missing value but then ignoring all the other values
of the sub structure.

- Stefan

On Wed, Jul 22, 2015 at 10:53 PM, Stefán Baxter <ste...@activitystream.com>
wrote:

> - never returns this: "yes", {"other":"true","all":"
> false","sometimes":"yes"}
>
> should have been:
>
> - never returns this: "yes", {"other":"true","all":"
> false","sometimes":"yes", "additional":"last entries only"}
>
> Regards,
>  -Stefan
>
> On Wed, Jul 22, 2015 at 10:52 PM, Stefán Baxter <ste...@activitystream.com
> > wrote:
>
>> Hi,
>>
>> I keep coming across *quirks* in Drill that are quite time consuming to
>> deal with and are now causing mounting concerns.
>>
>> This last one though is far more serious then the previous ones because
>> it deals with loss of data.
>>
>> I'm working with a small(ish) dataset of around 1m records (which I'm
>> more than happy to hand over to replicate this)
>>
>> The problem goes like this:
>>
>>    1. with dfs.tmp.`/test.json`
>>    - containing a structure like this (simplified);
>>    - 800k x
>>    {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
>>    - 100k
>>    x 
>> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
>>    entries only"}}
>>
>>    2. selecting: select some, t.others from dfs.tmp.`/test.json` as t;
>>    - returns only this for all the records: "yes",
>>    {"other":"true","all":"false","sometimes":"yes"}
>>    - never returns this:
>>    "yes", {"other":"true","all":"false","sometimes":"yes"}
>>
>> The query never returns returns this:
>> "yes", {"other":"true","all":"false","sometimes":"yes","additional":"last
>> entries only"} so the last entries in the file are incorrectly represented.
>>
>> To make matters a lot worse the the property is completely ignored in:
>> create X as * from dfs.tmp.`/test.json` and the now parquet file does not
>> include it at all.
>>
>> It looks, to me, that the dynamic schema discovery has stopped looking
>> for schema changes and is quite set in it's way, so set in fact, that it's
>> ignoring data.
>>
>> I'm guessing that this is potentially affecting more people than me.
>>
>> I believe I have produced this under 1.1 and 1.2-SNAPSHOT.
>>
>> Regards,
>>  -Stefan
>>
>
>

Reply via email to