[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635444 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635447 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635557 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37733845 Ah I see - so this isn't going to be externally a user-visible class (I didn't notice it was `private[spark]`)? Would it make sense to throw an assertion error if the

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37733908 Even if it's private we can end up with cases where users have a e.g. 10,000 partition RDD with only a few items in each partition. Do we know a priori when calling this

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734195 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13195/ --- If your project

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734242 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734241 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734634 It is hard to say what threshold to use. I couldn't think of a use case that requires a large window size, but I cannot say there is none. Another possible

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37735835 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13197/ --- If your project

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37735834 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37742731 @mridulm I think the RDD definition is actually `private[spark]` and it's just intended to be used internally for higher level algorithms. --- If your project is set up

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37748284 @pwendell I was referring not to the actual implementation, but expectation when using the exposed API. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-13 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/136 [SPARK-1241] Add sliding to RDD Sliding is useful for operations like creating n-grams, calculating total variation, numerical integration, etc. This is similar to

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37577527 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37583751 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have