[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-04-08 Thread mengxr
Github user mengxr closed the pull request at: https://github.com/apache/spark/pull/136 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-04-08 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-39924028 I'm closing this PR since it is now part of the AreaUnderCurve PR. I moved sliding to mllib and mark it private. The only usage now is in AreaUnderCurve with window size 2.

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-38881892 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-24 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-38506453 I think my initial sliding method PR addresses @pwendell 's concerns here - but runs afoul of some other concerns raised by @rxin about code complexity. Whiche

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37900586 I see the quadratic storage and this is why I didn't use it in the PR. I will use the implementation in this PR, but move it to MLlib and mark it private for internal use.

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-17 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37888440 Oh I see, I didn't realize that `partitions` only had only the tail of the partitions for each Partition object. Note that one problem with this is quadratic memory size --

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37887863 @mateiz I don't see the bugs you mentioned. compute() checks parent partitions to assemble the tail to append. I think the approach you suggested is the same as in this PR.

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-17 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37879303 BTW the way I would write this is that each partition's compute() should return all the windows that start with elements in that partition. To do this it may have to read a

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-17 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37879126 That approach does look better, though there seem to be some bugs in the code (e.g. compute() always works on partitions(0), and that code doesn't handle the case if many p

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37776246 @pwendell @mridulm , RDD.sliding is a public method in this PR. If we don't want users to treat it as a cheap operation, how about moving it to a separate RDDFunctions clas