Jinfeng, Thanks, that confirms my thoughts as well. If I query using full range bounds and all hash keys, then Kudu prunes to the exact tablets and there is no error. I'll watch that jira expectantly because Kudu + Drill would be an awseome combo. But without the pruning it's useless to us.
-Cliff On Mon, Jul 24, 2017 at 2:17 PM, Jinfeng Ni <j...@apache.org> wrote: > If you see such errors only when you enable predicate pushdown, it might be > related to a known issue; schema change failure caused by empty batch [1]. > This happened when predicate prunes everything, and kudu reader did not > return a RowResult with a schema. In such case, Drill would interprete the > requested column (such as a) as nullable int, which would lead conflict to > other minor-fragment which may have the data/schema. > > The reason why you hit such failure randomly : there is a race condition > for such conflict to happen. If the minor-fragment with empty batch is > executed after the one with data is executed, the empty batch would be > ignored. If reverse order, it would cause conflict, hence query failure. > > 1. https://issues.apache.org/jira/browse/DRILL-5546 > > > > On Mon, Jul 24, 2017 at 10:56 AM, Cliff Resnick <cre...@gmail.com> wrote: > > > I spent some time over the weekend altering Drill's storage-kudu to use > > Kudu's predicate pushdown api. Everything worked great as long as I > > performed flat filtered selects (eg. SELECT .. FROM .. WHERE ..") but > > whenever I tested aggregate queries, they would succeed sometimes, then > > fail other times -- using the exact same queries. > > > > The failures were always like below. After searching around, I came > across > > a number of jiras, like https://issues.apache.org/jira/browse/DRILL-2602 > > that imply Drill can't handle sorts/aggregate queries on "changing > > schemas". This was confusing to me because I was testing with a single > > table/single schema, which leaves me wondering if "changing schema" means > > the unknown type of the aggregate itself? Meaning, SELECT SUM(a),b FROM > t > > GROUP BY a; where field a is an INT64, Drill can't figure out how to deal > > with SUM(a) because it may exceed the scale of INT64? > > > > If someone could clarify this for me I'd really appreciate it. I'm really > > hoping my above understanding is not correct and it's just a problem with > > the Vector handling in storage-kudu, because otherwise it seems that > > Drill's aggregation capabilities are rather limited. > > > > Errors: > > > > java.lang.IllegalStateException: Failure while reading vector. Expected > > vector class of org.apache.drill.exec.vector.NullableIntVector but was > > holding vector class org.apache.drill.exec.vector.BigIntVector, field= > > campaign_id(BIGINT:REQUIRED) > > at org.apache.drill.exec.record.VectorContainer.getValueAccessorById( > > VectorContainer.java:321) > > at org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById( > > RecordBatchLoader.java:179) > > > > OR > > > > Error: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts > > with changing schemas. > > >