You are right that Q2 has to first apply flatten the field "details" before applying filter on top of the flattened rows. I do not think Drill currently supports to push filter down.
The difference between Q1 and Q2: Q1 is querying a 4-row table, while Q2 is querying a 1-row table; the flatten operator produces 4 rows on top of which the filter is applied. On Wed, Jun 28, 2017 at 9:51 AM, Lee, David <david....@blackrock.com> wrote: > > Is there a fundamental difference between the following queries? I can't > get the second example working with parquet files which contain 400,000+ > nested records.. > > It seems like the system wants to flatten every possible record before > applying the SQL Where clause to the flattened data structure.. > > Example 1: > > select b.* from dfs.`test1.json` b where b.item = 3 > > [ > { > "item": 1, > "item_name": "name_for_1" > }, > { > "item": 2, > "item_name": "name_for_2" > }, > { > "item": 3, > "item_name": "name_for_3" > }, > { > "item": 4, > "item_name": "name_for_4" > } > ] > > Example 2: > > select b.* from > (select flatten(a.details) as details > from dfs.`test2.json` a) b > where b.details.item = 3 > > { > "header": "my_header_info", > "details": [ > { > "item": 1, > "item_name": "name_for_1" > }, > { > "item": 2, > "item_name": "name_for_2" > }, > { > "item": 3, > "item_name": "name_for_3" > }, > { > "item": 4, > "item_name": "name_for_4" > } > ] > } > > > > > This message may contain information that is confidential or privileged. > If you are not the intended recipient, please advise the sender immediately > and delete this message. See http://www.blackrock.com/ > corporate/en-us/compliance/email-disclaimers for further information. > Please refer to http://www.blackrock.com/corporate/en-us/compliance/ > privacy-policy for more information about BlackRock’s Privacy Policy. > > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. > > © 2017 BlackRock, Inc. All rights reserved. >