Despite the odd usage, it does the trick, thanks Reynold! On Fri, May 22, 2015 at 2:47 PM Reynold Xin <r...@databricks.com> wrote:
> In 1.4 it actually shows col1 by default. > > In 1.3, you can add "col1" to the output, i.e. > > df.groupBy($"col1").agg($"col1", count($"col1").as("c")).show() > > > On Thu, May 21, 2015 at 11:22 PM, SLiZn Liu <sliznmail...@gmail.com> > wrote: > >> However this returns a single column of c, without showing the original >> col1. >> >> >> On Thu, May 21, 2015 at 11:25 PM Ram Sriharsha <sriharsha....@gmail.com> >> wrote: >> >>> df.groupBy($"col1").agg(count($"col1").as("c")).show >>> >>> On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu <sliznmail...@gmail.com> >>> wrote: >>> >>>> Hi Spark Users Group, >>>> >>>> I’m doing groupby operations on my DataFrame *df* as following, to get >>>> count for each value of col1: >>>> >>>> > df.groupBy("col1").agg("col1" -> "count").show // I don't know if I >>>> > should write like this. >>>> col1 COUNT(col1#347) >>>> aaa 2 >>>> bbb 4 >>>> ccc 4 >>>> ... >>>> and more... >>>> >>>> As I ‘d like to sort by the resulting count, with >>>> .sort("COUNT(col1#347)"), but the column name of the count result >>>> obviously cannot be retrieved in advance. Intuitively one might consider >>>> acquire column name by column index in a fashion of R’s DataFrame, except >>>> Spark doesn’t support. I have Googled *spark agg alias* and so forth, >>>> and checked DataFrame.as in Spark API, neither helped on this. Am I >>>> the only one who had ever got stuck on this issue or anything I have >>>> missed? >>>> >>>> REGARDS, >>>> Todd Leo >>>> >>>> >>> >>> >