I see, Thank you for explanation LInyuxin
On Wed, May 30, 2018 at 6:21 AM, Linyuxin wrote:
> Hi,
>
> Why not group by first then join?
>
> BTW, I don’t think there any difference between ‘distinct’ and ‘group by’
>
>
>
> Source code of 2.1:
>
> *def *distinct(): Dataset[T] = dropDuplicates()
>
Hi,
Why not group by first then join?
BTW, I don’t think there any difference between ‘distinct’ and ‘group by’
Source code of 2.1:
def distinct(): Dataset[T] = dropDuplicates()
…
def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
…
Aggregate(groupCols, aggCols, logicalPlan)