I'll start the hangout tomorrow at the usual time. I don't have a set
agenda yet but if there are any topics folks wish to discuss, please
respond on this thread such that others who might be interested can also
join.
Thanks.
Adding to what Jinfeng said, the LIMIT handling relies on the downstream
operator sending a 'kill incoming input stream' api which is called by the
parent operator on its child once the parent (Limit) has received the
required number of rows. Since the unit of processing in Drill is record
batche
Drill applies LIMIT filtering at row group level. For LIMIT n, it
will scan the first m row groups that have at least n rows, and
discard the rest of row groups. In your case, since you have only 1
row group, it does not have any row group filtering for LIMIT 1.
I'm not sure how 32767 comes from.
Does anyone know how and if the LIMIT push down to Parquet file works?
I have a parquet file with 53K records in 1 row group. When I run a SELECT
* from LIMIT 1, I see the Parquet reader operator process 32768
records. I would have expected either 1 or 53K. So questions;
1) Does the Parquet MR l