Github user mengxr closed the pull request at:
https://github.com/apache/spark/pull/136
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-39924028
I'm closing this PR since it is now part of the AreaUnderCurve PR. I moved
sliding to mllib and mark it private. The only usage now is in AreaUnderCurve
with window size 2.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-38881892
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your proj
Github user nkronenfeld commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-38506453
I think my initial sliding method PR addresses @pwendell 's concerns here -
but runs afoul of some other concerns raised by @rxin about code complexity.
Whiche
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37900586
I see the quadratic storage and this is why I didn't use it in the PR. I
will use the implementation in this PR, but move it to MLlib and mark it
private for internal use.
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37888440
Oh I see, I didn't realize that `partitions` only had only the tail of the
partitions for each Partition object. Note that one problem with this is
quadratic memory size --
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37887863
@mateiz I don't see the bugs you mentioned. compute() checks parent
partitions to assemble the tail to append. I think the approach you suggested
is the same as in this PR.
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37879303
BTW the way I would write this is that each partition's compute() should
return all the windows that start with elements in that partition. To do this
it may have to read a
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37879126
That approach does look better, though there seem to be some bugs in the
code (e.g. compute() always works on partitions(0), and that code doesn't
handle the case if many p
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37776246
@pwendell @mridulm , RDD.sliding is a public method in this PR. If we don't
want users to treat it as a cheap operation, how about moving it to a separate
RDDFunctions clas
10 matches
Mail list logo