in addition to this. selecting: select some, t.others, t.others.additional from dfs.tmp.`/test.json` as t; - returns this: "yes", {"additional":"last entries only"}, "last entries only"
finding the previously missing value but then ignoring all the other values of the sub structure. - Stefan On Wed, Jul 22, 2015 at 10:53 PM, Stefán Baxter <ste...@activitystream.com> wrote: > - never returns this: "yes", {"other":"true","all":" > false","sometimes":"yes"} > > should have been: > > - never returns this: "yes", {"other":"true","all":" > false","sometimes":"yes", "additional":"last entries only"} > > Regards, > -Stefan > > On Wed, Jul 22, 2015 at 10:52 PM, Stefán Baxter <ste...@activitystream.com > > wrote: > >> Hi, >> >> I keep coming across *quirks* in Drill that are quite time consuming to >> deal with and are now causing mounting concerns. >> >> This last one though is far more serious then the previous ones because >> it deals with loss of data. >> >> I'm working with a small(ish) dataset of around 1m records (which I'm >> more than happy to hand over to replicate this) >> >> The problem goes like this: >> >> 1. with dfs.tmp.`/test.json` >> - containing a structure like this (simplified); >> - 800k x >> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} >> - 100k >> x >> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last >> entries only"}} >> >> 2. selecting: select some, t.others from dfs.tmp.`/test.json` as t; >> - returns only this for all the records: "yes", >> {"other":"true","all":"false","sometimes":"yes"} >> - never returns this: >> "yes", {"other":"true","all":"false","sometimes":"yes"} >> >> The query never returns returns this: >> "yes", {"other":"true","all":"false","sometimes":"yes","additional":"last >> entries only"} so the last entries in the file are incorrectly represented. >> >> To make matters a lot worse the the property is completely ignored in: >> create X as * from dfs.tmp.`/test.json` and the now parquet file does not >> include it at all. >> >> It looks, to me, that the dynamic schema discovery has stopped looking >> for schema changes and is quite set in it's way, so set in fact, that it's >> ignoring data. >> >> I'm guessing that this is potentially affecting more people than me. >> >> I believe I have produced this under 1.1 and 1.2-SNAPSHOT. >> >> Regards, >> -Stefan >> > >