Re: Question about Drill aggregate queries and schema change

Cliff Resnick Mon, 24 Jul 2017 11:46:53 -0700

Jinfeng,

Thanks, that confirms my thoughts as well. If I query using full range
bounds and all hash keys, then Kudu prunes to the exact tablets and there
is no error. I'll watch that jira expectantly because Kudu + Drill would be
an awseome combo. But without the pruning it's useless to us.


-Cliff

On Mon, Jul 24, 2017 at 2:17 PM, Jinfeng Ni <j...@apache.org> wrote:

> If you see such errors only when you enable predicate pushdown, it might be
> related to a known issue; schema change failure caused by empty batch [1].
> This happened when predicate prunes everything, and kudu reader did not
> return a RowResult with a schema.  In such case, Drill would interprete the
> requested column (such as a) as nullable int, which would lead conflict to
> other minor-fragment which may have the data/schema.
>
> The reason why you hit such failure randomly : there is a race condition
> for such conflict to happen. If the minor-fragment with empty batch is
> executed after the one with data is executed, the empty batch would be
> ignored. If reverse order, it would cause conflict, hence query failure.
>
> 1. https://issues.apache.org/jira/browse/DRILL-5546
>
>
>
> On Mon, Jul 24, 2017 at 10:56 AM, Cliff Resnick <cre...@gmail.com> wrote:
>
> > I spent some time over the weekend altering Drill's storage-kudu to use
> > Kudu's predicate pushdown api. Everything worked great as long as I
> > performed flat filtered selects (eg. SELECT .. FROM .. WHERE ..") but
> > whenever I tested aggregate queries, they would succeed sometimes, then
> > fail other times -- using the exact same queries.
> >
> > The failures were always like below. After searching around, I came
> across
> > a number of jiras, like https://issues.apache.org/jira/browse/DRILL-2602
> > that imply Drill can't handle sorts/aggregate queries on "changing
> > schemas". This was confusing to me because I was testing with a single
> > table/single schema, which leaves me wondering if "changing schema" means
> > the unknown type of the aggregate itself? Meaning,  SELECT SUM(a),b FROM
> t
> > GROUP BY a; where field a is an INT64, Drill can't figure out how to deal
> > with SUM(a) because it may exceed the scale of INT64?
> >
> > If someone could clarify this for me I'd really appreciate it. I'm really
> > hoping my above understanding is not correct and it's just a problem with
> > the Vector handling in storage-kudu, because otherwise it seems that
> > Drill's aggregation capabilities are rather limited.
> >
> > Errors:
> >
> > java.lang.IllegalStateException: Failure while reading vector.  Expected
> > vector class of org.apache.drill.exec.vector.NullableIntVector but was
> > holding vector class org.apache.drill.exec.vector.BigIntVector, field=
> > campaign_id(BIGINT:REQUIRED)
> > at org.apache.drill.exec.record.VectorContainer.getValueAccessorById(
> > VectorContainer.java:321)
> > at org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(
> > RecordBatchLoader.java:179)
> >
> > OR
> >
> > Error: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts
> > with changing schemas.
> >
>

Re: Question about Drill aggregate queries and schema change

Reply via email to