[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-15 Thread ScrapCodes
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/140 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-1170-pyspark-histogram: added histogram ...

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/122#issuecomment-37642205 Hi Daniel, Thanks for the patch, It would be good to separate out the implementation of min max into a different PR and provide Rdd.min and RDD.max

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37712447 Hey Matei, For a large dataset someone might wanna do it once, like with stat counter all of the numbers are calculated in one go. --- If your project is set

[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/140#discussion_r10632860 --- Diff: project/build.properties --- @@ -14,4 +14,4 @@ # See the License for the specific language governing permissions and # limitations under

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37712645 Ahh I understood the downside, that would be just for numbers then. makes sense. May be we can have both ? --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/140#discussion_r10632880 --- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala --- @@ -86,14 +92,9 @@ class DoubleRDDFunctions(self: RDD[Double]) extends

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37505304 PriorityQueue in a way is just a wrapper over heapq and allows for blocking for put and get(AFAIU). We would need maxheapq to retain the top N smallest elements. One

[GitHub] spark pull request: Prevent ContextClassLoader of Actor from becom...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-37529958 Thanks for the fix. Only for the record this happens only when MASTER=local or local[2]. Looks good. It might be good to add above test case in ReplSuite

[GitHub] spark pull request: Prevent ContextClassLoader of Actor from becom...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-37530227 Mind changing the PR title to add Jira ID? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37531010 It might be good to add this test in java8 API suite ? Not sure if its 100% necessary, but there exist one for all other APIs (I hope!!). Thoughts ? --- If your

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37618967 Hey Matei, Got rid of copying `heapq.py` and all the license stuff. But resorted to using internal API of heapq though. It should be simpler. I

[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/125#issuecomment-37379933 @pwendell thoughts ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552833 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages

[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552982 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages

[GitHub] spark pull request: SPARK-1170 Added histogram(buckets) to pyspark...

2014-03-11 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/121 SPARK-1170 Added histogram(buckets) to pyspark and not histogram(noOfBuckets). That can be a part 2 of this PR. If we can have min and max functions on a RDD of double, that would be good. You

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-37272574 Hey Matei, Thanks ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1096, a space after comment style checke...

2014-03-11 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/124 SPARK-1096, a space after comment style checker. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1096/scalastyle

[GitHub] spark pull request: SPARK-1168, Added foldByKey to pyspark.

2014-03-10 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/115 SPARK-1168, Added foldByKey to pyspark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1168/pyspark-foldByKey

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-10 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37161692 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37096375 Very cool, finally we have this ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-07 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/97 Spark 1162 Implemented takeOrdered in pyspark. Since python does not have a library for max heap and usual tricks like inverting values etc.. does not work for all cases. So best thing I could

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/93 SPARK-1162 Added top in python. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1162/pyspark-top-takeOrdered

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/93#discussion_r10370555 --- Diff: python/pyspark/rdd.py --- @@ -628,6 +669,26 @@ def mergeMaps(m1, m2): m1[k] += v return m1

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-36971911 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Spark 1165 rdd.intersection in python and java

2014-03-05 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/80#issuecomment-36729592 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1164 Deprecated reduceByKeyToDriver as i...

2014-03-04 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/72 SPARK-1164 Deprecated reduceByKeyToDriver as it is an alias for reduceByKeyLocally You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes

[GitHub] spark pull request: SPARK-1109 wrong API docs for pyspark map func...

2014-03-04 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/73 SPARK-1109 wrong API docs for pyspark map function You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1109/wrong-API

[GitHub] spark pull request: SPARK-964 Fix for -java-home note.

2014-03-04 Thread ScrapCodes
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/71 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.

2014-03-03 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/71 [WIP] SPARK-964 Fix for -java-home note. I just did a manual testing of this. with -java-home jdk, setting just JAVA_HOME and both. Hope it covers all cases. It is work

[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/71#issuecomment-36597306 @pwendell Hey Patrick, It might be good to have jenkins not test the PRs which start with [WIP] or WIP. Or something like that ? --- If your project is set up

[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/71#issuecomment-36598821 It does not cover the case if JAVA_HOME points to invalid directory, it will simply takes alternate path instead of failing nicely. --- If your project is set up

[GitHub] spark pull request: SPARK-1121 Only add avro if the build is for H...

2014-02-26 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-36217654 Rebased !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature