Hi, event_count is a column alias for count(*) (without the as).
Regards, -Stefan On Thu, Jul 30, 2015 at 3:32 PM, Jacques Nadeau <jacq...@dremio.com> wrote: > select p.client_ip.country_code country_code, count(*) event_count from > dfs.tmp.`/events` as p where event_count > 30000 group by > p.client_ip.country_code; > +---------------+--------------+ > | country_code | event_count | > +---------------+--------------+ > +---------------+--------------+ > > In this situation, a fully-schemaed system would have stated that the > event_count column does not exist. In the case of json, we don't know if > you're unlying data has an event_count column. When we actually try to get > it, we find it doesn't exist and is therefore null. Null is never greater > than 30000 so we exclude all records. Modify your data to look like this: > > {event_count: 30001, "client_ip":{"country_code":"US"}} > {event_count: 30001, "client_ip":{"country_code":"US"}} > {event_count: 30001, "client_ip":{"country_code":"US"}} > {event_count: 30001, "client_ip":{"country_code":"GB"}} > {event_count: 30001, "client_ip":{"country_code":"US"}} > > And you'll see what your query is actually doing. In SQL, an aggregate > function is applied after the WHERE clause. The only way to apply a filter > after an aggregate is with the HAVING clause. > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Thu, Jul 30, 2015 at 8:20 AM, Andries Engelbrecht < > aengelbre...@maprtech.com> wrote: > > > Last I checked group by and having clauses do not support column aliases. > > > > —Andries > > > > > > > On Jul 30, 2015, at 7:19 AM, Stefán Baxter <ste...@activitystream.com> > > wrote: > > > > > > Hi, > > > > > > That last case works as expected, sorry, this test data does have null > > > values for country_code. > > > > > > That means that I have a working solution but that it would be nice if > v1 > > > (above) would work. > > > > > > Thank you, > > > -Stefán > > > > > > On Thu, Jul 30, 2015 at 2:15 PM, Stefán Baxter < > > ste...@activitystream.com> > > > wrote: > > > > > >> make that "Count in where or having clause does not work as expected" > > >> > > >> On Thu, Jul 30, 2015 at 2:14 PM, Stefán Baxter < > > ste...@activitystream.com> > > >> wrote: > > >> > > >>> Hi, > > >>> > > >>> I have data that can be reduced to this: > > >>> > > >>> - {"client_ip":{"country_code":"US"}} > > >>> - {"client_ip":{"country_code":"US"}} > > >>> - {"client_ip":{"country_code":"US"}} > > >>> - {"client_ip":{"country_code":"GB"}} > > >>> - {"client_ip":{"country_code":"US"}} > > >>> > > >>> This works fine: > > >>> > > >>> select p.client_ip.country_code country_code, count(*) event_count > from > > >>> dfs.tmp.`/events` as p group by p.client_ip.country_code; > > >>> > > >>> > > >>> +---------+--------------+ > > >>> | country_code | event_count | > > >>> +---------+--------------+ > > >>> | US | 21516 | > > >>> | GB | 323594 | > > >>> +---------+--------------+ > > >>> > > >>> > > >>> This simple where clause returns nothing: > > >>> > > >>> select p.client_ip.country_code country_code, count(*) event_count > from > > >>> dfs.tmp.`/events` as p where event_count > 30000 group by > > >>> p.client_ip.country_code; > > >>> +---------------+--------------+ > > >>> | country_code | event_count | > > >>> +---------------+--------------+ > > >>> +---------------+--------------+ > > >>> > > >>> > > >>> Using a having clause (v1): > > >>> > > >>> select p.client_ip.country_code country_code, count(*) event_count > from > > >>> dfs.tmp.`/events` as p group by p.client_ip.country_code having > > event_count > > >>>> 30000; > > >>> > > >>> Error: PARSE ERROR: From line 1, column 169 to line 1, column 179: > > >>> Expression 'event_count' is not being grouped > > >>> > > >>> > > >>> Using a having clause (v2): > > >>> > > >>> select p.client_ip.country_code country_code, count(*) event_count > from > > >>> dfs.tmp.`/events` as p group by p.client_ip.country_code having > > count(*) > > > >>> 30000; > > >>> +---------------+--------------+ > > >>> | country_code | event_count | > > >>> +---------------+--------------+ > > >>> | GB | 323594 | > > >>> | null* | 566667 | > > >>> +---------------+--------------+ > > >>> > > >>> > > >>> * there are no null values in the database and this looks to be the > > >>> "total for the rest" > > >>> > > >>> > > >>> Does anyone know how this can be made to work? > > >>> > > >>> Regards, > > >>> -Stefan > > >>> > > >> > > >> > > > > >