Based on my limited understanding of Drill's KuduRecordReader, the problem seems to be in the next() method [1]. When RowResult's iterator return false for hasNext(), in the case filter prune everything, the code will skip the call of addRowResult(). That means no columns/data will be added to scan's batch. Nullable int will be injected in downstream operator.
1. https://github.com/apache/drill/blob/master/contrib/storage-kudu/src/main/java/org/apache/drill/exec/store/kudu/KuduRecordReader.java#L149-L163 On Mon, Jul 24, 2017 at 1:35 PM, Cliff Resnick <cre...@gmail.com> wrote: > Jinfeng, > > I'm wondering if there's a way to push schema info to Drill even if there > is no result. KuduScanner always has schema, and RecordReader always has > scanner. But I can't seem to find the disconnect. Any idea if this is > possible even if it's Kudu-specific hack? > > -Cliff > > On Mon, Jul 24, 2017 at 2:46 PM, Cliff Resnick <cre...@gmail.com> wrote: > >> Jinfeng, >> >> Thanks, that confirms my thoughts as well. If I query using full range >> bounds and all hash keys, then Kudu prunes to the exact tablets and there >> is no error. I'll watch that jira expectantly because Kudu + Drill would be >> an awseome combo. But without the pruning it's useless to us. >> >> -Cliff >> >> On Mon, Jul 24, 2017 at 2:17 PM, Jinfeng Ni <j...@apache.org> wrote: >> >>> If you see such errors only when you enable predicate pushdown, it might >>> be >>> related to a known issue; schema change failure caused by empty batch >>> [1]. >>> This happened when predicate prunes everything, and kudu reader did not >>> return a RowResult with a schema. In such case, Drill would interprete >>> the >>> requested column (such as a) as nullable int, which would lead conflict >>> to >>> other minor-fragment which may have the data/schema. >>> >>> The reason why you hit such failure randomly : there is a race condition >>> for such conflict to happen. If the minor-fragment with empty batch is >>> executed after the one with data is executed, the empty batch would be >>> ignored. If reverse order, it would cause conflict, hence query failure. >>> >>> 1. https://issues.apache.org/jira/browse/DRILL-5546 >>> >>> >>> >>> On Mon, Jul 24, 2017 at 10:56 AM, Cliff Resnick <cre...@gmail.com> >>> wrote: >>> >>> > I spent some time over the weekend altering Drill's storage-kudu to use >>> > Kudu's predicate pushdown api. Everything worked great as long as I >>> > performed flat filtered selects (eg. SELECT .. FROM .. WHERE ..") but >>> > whenever I tested aggregate queries, they would succeed sometimes, then >>> > fail other times -- using the exact same queries. >>> > >>> > The failures were always like below. After searching around, I came >>> across >>> > a number of jiras, like https://issues.apache.org/jira >>> /browse/DRILL-2602 >>> > that imply Drill can't handle sorts/aggregate queries on "changing >>> > schemas". This was confusing to me because I was testing with a single >>> > table/single schema, which leaves me wondering if "changing schema" >>> means >>> > the unknown type of the aggregate itself? Meaning, SELECT SUM(a),b >>> FROM t >>> > GROUP BY a; where field a is an INT64, Drill can't figure out how to >>> deal >>> > with SUM(a) because it may exceed the scale of INT64? >>> > >>> > If someone could clarify this for me I'd really appreciate it. I'm >>> really >>> > hoping my above understanding is not correct and it's just a problem >>> with >>> > the Vector handling in storage-kudu, because otherwise it seems that >>> > Drill's aggregation capabilities are rather limited. >>> > >>> > Errors: >>> > >>> > java.lang.IllegalStateException: Failure while reading vector. >>> Expected >>> > vector class of org.apache.drill.exec.vector.NullableIntVector but was >>> > holding vector class org.apache.drill.exec.vector.BigIntVector, field= >>> > campaign_id(BIGINT:REQUIRED) >>> > at org.apache.drill.exec.record.VectorContainer.getValueAccessorById( >>> > VectorContainer.java:321) >>> > at org.apache.drill.exec.record.RecordBatchLoader.getValueAcces >>> sorById( >>> > RecordBatchLoader.java:179) >>> > >>> > OR >>> > >>> > Error: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support >>> sorts >>> > with changing schemas. >>> > >>> >> >> >