Re: Sort Merge Join from the filesystem

2015-11-16 Thread Alex Nastetsky
From:* Alex Nastetsky [mailto:alex.nastet...@vervemobile.com] > *Sent:* Tuesday, November 10, 2015 3:03 AM > *To:* Cheng, Hao > *Cc:* Reynold Xin; dev@spark.apache.org > *Subject:* Re: Sort Merge Join from the filesystem > > > > Thanks for creating that ticket. > > > > Another th

RE: Sort Merge Join from the filesystem

2015-11-09 Thread Cheng, Hao
: Reynold Xin; dev@spark.apache.org Subject: Re: Sort Merge Join from the filesystem Thanks for creating that ticket. Another thing I was thinking of, is doing this type of join between dataset A which is already partitioned/sorted on disk and dataset B, which gets generated during the run of

Re: Sort Merge Join from the filesystem

2015-11-09 Thread Alex Nastetsky
*From:* Reynold Xin [mailto:r...@databricks.com] > *Sent:* Thursday, November 5, 2015 1:36 AM > *To:* Alex Nastetsky > *Cc:* dev@spark.apache.org > *Subject:* Re: Sort Merge Join from the filesystem > > > > It's not supported yet, and not sure if there is a ticket for it. I do

RE: Sort Merge Join from the filesystem

2015-11-04 Thread Cheng, Hao
:36 AM To: Alex Nastetsky Cc: dev@spark.apache.org Subject: Re: Sort Merge Join from the filesystem It's not supported yet, and not sure if there is a ticket for it. I don't think there is anything fundamentally hard here either. On Wed, Nov 4, 2015 at 6:37 AM, Alex Nastetsky mailto:a

Re: Sort Merge Join from the filesystem

2015-11-04 Thread Reynold Xin
It's not supported yet, and not sure if there is a ticket for it. I don't think there is anything fundamentally hard here either. On Wed, Nov 4, 2015 at 6:37 AM, Alex Nastetsky < alex.nastet...@vervemobile.com> wrote: > (this is kind of a cross-post from the user list) > > Does Spark support doi

Sort Merge Join from the filesystem

2015-11-04 Thread Alex Nastetsky
(this is kind of a cross-post from the user list) Does Spark support doing a sort merge join on two datasets on the file system that have already been partitioned the same with the same number of partitions and sorted within each partition, without needing to repartition/sort them again? This fun