select p.client_ip.country_code country_code, count(*) event_count from
dfs.tmp.`/events` as p where event_count > 30000 group by
p.client_ip.country_code;
+---------------+--------------+
| country_code  | event_count  |
+---------------+--------------+
+---------------+--------------+

In this situation, a fully-schemaed system would have stated that the
event_count column does not exist.  In the case of json, we don't know if
you're unlying data has an event_count column.  When we actually try to get
it, we find it doesn't exist and is therefore null.  Null is never greater
than 30000 so we exclude all records.  Modify your data to look like this:

{event_count: 30001, "client_ip":{"country_code":"US"}}
{event_count: 30001, "client_ip":{"country_code":"US"}}
{event_count: 30001, "client_ip":{"country_code":"US"}}
{event_count: 30001, "client_ip":{"country_code":"GB"}}
{event_count: 30001, "client_ip":{"country_code":"US"}}

And you'll see what your query is actually doing.  In SQL, an aggregate
function is applied after the WHERE clause.  The only way to apply a filter
after an aggregate is with the HAVING clause.



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Jul 30, 2015 at 8:20 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> Last I checked group by and having clauses do not support column aliases.
>
> —Andries
>
>
> > On Jul 30, 2015, at 7:19 AM, Stefán Baxter <ste...@activitystream.com>
> wrote:
> >
> > Hi,
> >
> > That last case works as expected, sorry, this test data does have null
> > values for country_code.
> >
> > That means that I have a working solution but that it would be nice if v1
> > (above) would work.
> >
> > Thank you,
> > -Stefán
> >
> > On Thu, Jul 30, 2015 at 2:15 PM, Stefán Baxter <
> ste...@activitystream.com>
> > wrote:
> >
> >> make that "Count in where or having clause does not work as expected"
> >>
> >> On Thu, Jul 30, 2015 at 2:14 PM, Stefán Baxter <
> ste...@activitystream.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I have data that can be reduced to this:
> >>>
> >>>   - {"client_ip":{"country_code":"US"}}
> >>>   - {"client_ip":{"country_code":"US"}}
> >>>   - {"client_ip":{"country_code":"US"}}
> >>>   - {"client_ip":{"country_code":"GB"}}
> >>>   - {"client_ip":{"country_code":"US"}}
> >>>
> >>> This works fine:
> >>>
> >>> select p.client_ip.country_code country_code, count(*) event_count from
> >>> dfs.tmp.`/events` as p group by p.client_ip.country_code;
> >>>
> >>>
> >>> +---------+--------------+
> >>> | country_code  | event_count  |
> >>> +---------+--------------+
> >>> | US      | 21516        |
> >>> | GB      | 323594       |
> >>> +---------+--------------+
> >>>
> >>>
> >>> This simple where clause returns nothing:
> >>>
> >>> select p.client_ip.country_code country_code, count(*) event_count from
> >>> dfs.tmp.`/events` as p where event_count > 30000 group by
> >>> p.client_ip.country_code;
> >>> +---------------+--------------+
> >>> | country_code  | event_count  |
> >>> +---------------+--------------+
> >>> +---------------+--------------+
> >>>
> >>>
> >>> Using a having clause (v1):
> >>>
> >>> select p.client_ip.country_code country_code, count(*) event_count from
> >>> dfs.tmp.`/events` as p group by p.client_ip.country_code having
> event_count
> >>>> 30000;
> >>>
> >>> Error: PARSE ERROR: From line 1, column 169 to line 1, column 179:
> >>> Expression 'event_count' is not being grouped
> >>>
> >>>
> >>> Using a having clause (v2):
> >>>
> >>> select p.client_ip.country_code country_code, count(*) event_count from
> >>> dfs.tmp.`/events` as p group by p.client_ip.country_code having
> count(*) >
> >>> 30000;
> >>> +---------------+--------------+
> >>> | country_code  | event_count  |
> >>> +---------------+--------------+
> >>> | GB            | 323594       |
> >>> | null*          | 566667       |
> >>> +---------------+--------------+
> >>>
> >>>
> >>> * there are no null values in the database and this looks to be the
> >>> "total for the rest"
> >>>
> >>>
> >>> Does anyone know how this can be made to work?
> >>>
> >>> Regards,
> >>> -Stefan
> >>>
> >>
> >>
>
>

Reply via email to