Actually, even without multiple storage types, this could be radically
confusing.

If I have many avro files that are partitioned into directories, then
queries that use the partitioning to limit the files that I see could
include or exclude more recent files that have added a new field.

That means that a query would succeed or fail according to which date range
I use for the query.

That seems pretty radically bad.




On Mon, Dec 14, 2015 at 9:33 AM, Stefán Baxter <ste...@activitystream.com>
wrote:

> Hi,
>
> This simply can not be the desired behavior!
>
> This prevents from using a field from a changing schema with dir0
> sub-selection (directory pruning) as the altered/full schema is never part
> of the query and it subsequently fails.
>
> Drill should, IMOP, never have rules that are dependent on the underlying
> storage type. If the query runs with JSON and Parquet then it should work
> for Avro as well.
>
> I'm hoping this strict schema validation is all just a misunderstanding.
>
> Regards,
>  -Stefán
>
> On Mon, Dec 14, 2015 at 3:28 PM, Kamesh <kamesh.had...@gmail.com> wrote:
>
> > For Avro files, we first construct the schema, and this schema is used
> for
> > validating queries. So, if there are any errors in the query (like the
> > invalid field references) it will fail fast. As of now, for other file
> > formats, query validation (checking  for invalid field reference) does
> not
> > happen, and at run time, it constructs the schema for them and hence
> nulls
> > for invalid fields.
> >
> >
> > On Mon, Dec 14, 2015 at 2:36 PM, Stefán Baxter <
> ste...@activitystream.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I'm getting the following error when querying Avro files:
> > >
> > > Error: VALIDATION ERROR: From line 1, column 48 to line 1, column 57:
> > > Column 'some_col' not found in any table
> > >
> > > It's true that the field is in none of the tables I'm targeting, in
> that
> > > particular query, but that does not mean that it is in none of the
> > possible
> > > files I could be querying.
> > >
> > > We use Avro to get the benefits of the schema but I never expected
> Drill
> > to
> > > enforce it this way.
> > >
> > > Why do unresolved  columns not return null?
> > >
> > > This makes no sense to me as I think a fundamental trade of Drill, when
> > > trying to eliminate ETL, is to return null for any missing fields.
> > >
> > > Please advise.
> > >
> > > Regards,
> > >  -Stefán
> > >
> >
> >
> >
> > --
> > Kamesh.
> >
>

Reply via email to