Re: Aggregate over a column: the proper way to do

Sean Owen Fri, 08 Apr 2022 05:12:44 -0700

Dataset.count() returns one value directly?

On Thu, Apr 7, 2022 at 11:25 PM sam smith <qustacksm2123...@gmail.com>
wrote:


> My bad, yes of course that! still i don't like the ..
> select("count(myCol)") .. part in my line is there any replacement to that ?
>
> Le ven. 8 avr. 2022 à 06:13, Sean Owen <sro...@gmail.com> a écrit :
>
>> Just do an average then? Most of my point is that filtering to one group
>> and then grouping is pointless.
>>
>> On Thu, Apr 7, 2022, 11:10 PM sam smith <qustacksm2123...@gmail.com>
>> wrote:
>>
>>> What if i do avg instead of count?
>>>
>>> Le ven. 8 avr. 2022 à 05:32, Sean Owen <sro...@gmail.com> a écrit :
>>>
>>>> Wait, why groupBy at all? After the filter only rows with myCol equal
>>>> to your target are left. There is only one group. Don't group just count
>>>> after the filter?
>>>>
>>>> On Thu, Apr 7, 2022, 10:27 PM sam smith <qustacksm2123...@gmail.com>
>>>> wrote:
>>>>
>>>>> I want to aggregate a column by counting the number of rows having the
>>>>> value "myTargetValue" and return the result
>>>>> I am doing it like the following:in JAVA
>>>>>
>>>>>> long result =
>>>>>> dataset.filter(dataset.col("myCol").equalTo("myTargetVal")).groupBy(col("myCol")).agg(count(dataset.col("myCol"))).select("count(myCol)").first().getLong(0);
>>>>>
>>>>>
>>>>> Is that the right way? if no, what if a more optimized way to do that
>>>>> (always in JAVA)?
>>>>> Thanks for the help.
>>>>>
>>>>

Re: Aggregate over a column: the proper way to do

Reply via email to