[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-24 Thread harishreedharan
Github user harishreedharan closed the pull request at: https://github.com/apache/spark/pull/2463 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the featur

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-24 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56776683 Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature en

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56776640 Let's close this issue then --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-20 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56287554 Gotcha - sounds good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-20 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56285227 Agreed. This patch simply make it more difficult to overflow - so it is not really a fix. Will close this.  Thanks, Hari On Sat, S

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-20 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56284727 @harishreedharan I think the fix is that for people chaining many unions together they should use `SparkContext#union` - if that's the case we might want to just leave i

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-20 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56284671 @ericdf is your original issue fixed by using the union utility function? I misread it to be a bug report, but I think the issue is just that you were chaining together

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-19 Thread ericdf
Github user ericdf commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56248234 Ah! I was not aware that there was an API for getting a union for a list on SparkContext -- I had only seen the one on RDD itself, which only takes a single `other' RDD.

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-19 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56247572 @ericdf What is the type of rddgen in your pseudocode? I'm not understanding why the existing `SparkContext#union[T](Seq[RDD[T]])` doesn't already do what you want.

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-19 Thread ericdf
Github user ericdf commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56247070 Fundamentally the way union works is flawed because it forces a caller to create a recursive structure. In my case, I have files = [] # some list rdd

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56243629 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20589/consoleFull) for PR 2463 at commit [`c3f476c`](https://github.com/a

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-19 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56242898 Yes. The issue is that there could be union RDDs inside the rdds array - so the recursion may be unavoidable, but we can make them take fewer frames. I can't thin

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-19 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56239570 Is the goal here just to make the recursive calls take fewer stack frames and make it harder to overflow ? I got the impression there was an infinite recusrsion lurking he

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2463#issuecomment-56236850 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20589/consoleFull) for PR 2463 at commit [`c3f476c`](https://github.com/ap

[GitHub] spark pull request: SPARK-3604. Replace the map call in UnionRDD#g...

2014-09-19 Thread harishreedharan
GitHub user harishreedharan opened a pull request: https://github.com/apache/spark/pull/2463 SPARK-3604. Replace the map call in UnionRDD#getPartitions method to avo... ...id creating an additional Seq. You can merge this pull request into a Git repository by running: $ git pul