From:* Alex Nastetsky [mailto:alex.nastet...@vervemobile.com]
> *Sent:* Tuesday, November 10, 2015 3:03 AM
> *To:* Cheng, Hao
> *Cc:* Reynold Xin; dev@spark.apache.org
> *Subject:* Re: Sort Merge Join from the filesystem
>
>
>
> Thanks for creating that ticket.
>
>
>
> Another th
: Reynold Xin; dev@spark.apache.org
Subject: Re: Sort Merge Join from the filesystem
Thanks for creating that ticket.
Another thing I was thinking of, is doing this type of join between dataset A
which is already partitioned/sorted on disk and dataset B, which gets generated
during the run of
*From:* Reynold Xin [mailto:r...@databricks.com]
> *Sent:* Thursday, November 5, 2015 1:36 AM
> *To:* Alex Nastetsky
> *Cc:* dev@spark.apache.org
> *Subject:* Re: Sort Merge Join from the filesystem
>
>
>
> It's not supported yet, and not sure if there is a ticket for it. I do
:36 AM
To: Alex Nastetsky
Cc: dev@spark.apache.org
Subject: Re: Sort Merge Join from the filesystem
It's not supported yet, and not sure if there is a ticket for it. I don't think
there is anything fundamentally hard here either.
On Wed, Nov 4, 2015 at 6:37 AM, Alex Nastetsky
mailto:a
It's not supported yet, and not sure if there is a ticket for it. I don't
think there is anything fundamentally hard here either.
On Wed, Nov 4, 2015 at 6:37 AM, Alex Nastetsky <
alex.nastet...@vervemobile.com> wrote:
> (this is kind of a cross-post from the user list)
>
> Does Spark support doi
(this is kind of a cross-post from the user list)
Does Spark support doing a sort merge join on two datasets on the file
system that have already been partitioned the same with the same number of
partitions and sorted within each partition, without needing to
repartition/sort them again?
This fun