/me sighs of relief On Mon, Dec 14, 2015 at 7:28 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Actually, even without multiple storage types, this could be radically > confusing. > > If I have many avro files that are partitioned into directories, then > queries that use the partitioning to limit the files that I see could > include or exclude more recent files that have added a new field. > > That means that a query would succeed or fail according to which date range > I use for the query. > > That seems pretty radically bad. > > > > > On Mon, Dec 14, 2015 at 9:33 AM, Stefán Baxter <ste...@activitystream.com> > wrote: > > > Hi, > > > > This simply can not be the desired behavior! > > > > This prevents from using a field from a changing schema with dir0 > > sub-selection (directory pruning) as the altered/full schema is never > part > > of the query and it subsequently fails. > > > > Drill should, IMOP, never have rules that are dependent on the underlying > > storage type. If the query runs with JSON and Parquet then it should work > > for Avro as well. > > > > I'm hoping this strict schema validation is all just a misunderstanding. > > > > Regards, > > -Stefán > > > > On Mon, Dec 14, 2015 at 3:28 PM, Kamesh <kamesh.had...@gmail.com> wrote: > > > > > For Avro files, we first construct the schema, and this schema is used > > for > > > validating queries. So, if there are any errors in the query (like the > > > invalid field references) it will fail fast. As of now, for other file > > > formats, query validation (checking for invalid field reference) does > > not > > > happen, and at run time, it constructs the schema for them and hence > > nulls > > > for invalid fields. > > > > > > > > > On Mon, Dec 14, 2015 at 2:36 PM, Stefán Baxter < > > ste...@activitystream.com> > > > wrote: > > > > > > > Hi, > > > > > > > > I'm getting the following error when querying Avro files: > > > > > > > > Error: VALIDATION ERROR: From line 1, column 48 to line 1, column 57: > > > > Column 'some_col' not found in any table > > > > > > > > It's true that the field is in none of the tables I'm targeting, in > > that > > > > particular query, but that does not mean that it is in none of the > > > possible > > > > files I could be querying. > > > > > > > > We use Avro to get the benefits of the schema but I never expected > > Drill > > > to > > > > enforce it this way. > > > > > > > > Why do unresolved columns not return null? > > > > > > > > This makes no sense to me as I think a fundamental trade of Drill, > when > > > > trying to eliminate ETL, is to return null for any missing fields. > > > > > > > > Please advise. > > > > > > > > Regards, > > > > -Stefán > > > > > > > > > > > > > > > > -- > > > Kamesh. > > > > > >