Hi, Why not group by first then join? BTW, I don’t think there any difference between ‘distinct’ and ‘group by’
Source code of 2.1: def distinct(): Dataset[T] = dropDuplicates() … def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan { … Aggregate(groupCols, aggCols, logicalPlan) } 发件人: Chetan Khatri [mailto:chetan.opensou...@gmail.com] 发送时间: 2018年5月30日 2:52 收件人: Irving Duran <irving.du...@gmail.com> 抄送: Georg Heiler <georg.kf.hei...@gmail.com>; user <user@spark.apache.org> 主题: Re: GroupBy in Spark / Scala without Agg functions Georg, Sorry for dumb question. Help me to understand - if i do DF.select(A,B,C,D).distinct() that would be same as above groupBy without agg in sql right ? On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri <chetan.opensou...@gmail.com<mailto:chetan.opensou...@gmail.com>> wrote: I don't want to get any aggregation, just want to know rather saying distinct to all columns any other better approach ? On Wed, May 30, 2018 at 12:16 AM, Irving Duran <irving.du...@gmail.com<mailto:irving.du...@gmail.com>> wrote: Unless you want to get a count, yes. Thank You, Irving Duran On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <chetan.opensou...@gmail.com<mailto:chetan.opensou...@gmail.com>> wrote: Georg, I just want to double check that someone wrote MSSQL Server script where it's groupby all columns. What is alternate best way to do distinct all columns ? On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <georg.kf.hei...@gmail.com<mailto:georg.kf.hei...@gmail.com>> wrote: Why do you group if you do not want to aggregate? Isn't this the same as select distinct? Chetan Khatri <chetan.opensou...@gmail.com<mailto:chetan.opensou...@gmail.com>> schrieb am Di., 29. Mai 2018 um 20:21 Uhr: All, I have scenario like this in MSSQL Server SQL where i need to do groupBy without Agg function: Pseudocode: select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d ob from student as m inner join general_register g on m.student_id = g.student_i d group by m.student_id, m.student_name, m.student_std, m.student_group, m.student_dob I tried to doing in spark but i am not able to get Dataframe as return value, how this kind of things could be done in Spark. Thanks