I have filed a JIRA on this issue. I do see this as important as users shouldn't expect a different behavior in this case.
Thanks! https://issues.apache.org/jira/browse/DRILL-4379 On Thu, Feb 4, 2016 at 11:53 AM, Jacques Nadeau <jacq...@dremio.com> wrote: > Yeah, not ideal. We should get a JIRA up and fix this. > > Since I've seen the code, it isn't surprising either. An easier way to > understand this behavior is run the query select dir0 from t limit 1 (where > t is one directory versus two). In the single case, you'll see that dir0 is > null. (Thus is why the count returns zero records.) > > I believe that the dirX code currently relies on the shared base. This > means that it will work even in the case of using globbing (a fairly > complicated case in how it interacts with dirX). However, it means that it > will fail in this situation to behave the way you would expect. You could > see a similarly unexpected behavior if you had one first-level level and > two subdirectories within that first level. I agree that it is an issue and > we should probably handle this as a special case. > > Can you file a jira with a couple examples that behave differently than you > expected? > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Thu, Feb 4, 2016 at 8:21 AM, John Omernik <j...@omernik.com> wrote: > > > Prior to posting a JIRA, I thought I'd toss this here: > > > > If I have a directory: data with subdirectories with parquet files in it > > > > > > data/2016-01-01 > > data/2016-01-02 > > > > (Seem familiar? This came up in my other testing) > > > > > > If I have MORE then one subdirectory, > > > > then > > > > select count(1) from `data/` where dir0='2016-01-01' > > > > Works fine. > > > > However, if I have EXACTLY one subdirectory, then > > > > select count(1) from `data/` where dir0 = '2016-01-01' > > > > Takes 15 seconds (instead of returning almost instantly) and reports 0 > > records for count. > > Note, this directory DOES exists, so that is not the issue. > > > > If I add a second directory, then the exact query returns almost > instantly, > > and reports the correct number of records. > > > > In addition, when there is only one directory, select count(1) from > `data/` > > returns instant and the correct count. > > > > To me, it appears if there is ONE and only ONE subdirectory, then dir0= > > doesn't work as I think people would expect it to. I can't think of a > real > > reason to have this behave, and to me it violates the principle of "least > > surprise", but I am not up on the internals of Drill, so I thought I'd > post > > here first. > > > > John > > >