[GitHub] spark pull request: remove staging dir when app quiting for yarn-c...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/154#issuecomment-37750476 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37749785 For things like n-grams, isn't it okay to do them just per-partition and not worry about doing stuff across partitions? I agree that both this approach and the one in https

[GitHub] spark pull request: remove staging dir when app quiting for yarn-c...

2014-03-15 Thread gzm55
GitHub user gzm55 opened a pull request: https://github.com/apache/spark/pull/154 remove staging dir when app quiting for yarn-cluster mode In yarn-cluster, the driver is actually running as 'yarn' user. When posting jobs from other users, we need give stagingDir a full path, so

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread gzm55
Github user gzm55 commented on a diff in the pull request: https://github.com/apache/spark/pull/151#discussion_r10638121 --- Diff: core/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandlerMacro.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/151#discussion_r10638011 --- Diff: core/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandlerMacro.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Softwa

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread gzm55
Github user gzm55 commented on a diff in the pull request: https://github.com/apache/spark/pull/151#discussion_r10638000 --- Diff: core/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandlerMacro.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: fix compile error of streaming project

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/153#issuecomment-37748734 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/151#discussion_r10637951 --- Diff: core/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandlerMacro.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Softwa

[GitHub] spark pull request: fix compile error of streaming project

2014-03-15 Thread gzm55
GitHub user gzm55 opened a pull request: https://github.com/apache/spark/pull/153 fix compile error of streaming project explicit return type for implicit function You can merge this pull request into a Git repository by running: $ git pull https://github.com/gzm55/spark work/s

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10637947 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd)

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37748627 To clarify, I am not saying we should not be configuring what is in container-log4j.properties - but we should be trying to do that while preserving the ability to configu

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37748592 But that would be to debug yarn/hadoop api's primarily - and no easy way to inject spark specific logging levels. I am curious why this was required actually. Cur

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread gzm55
Github user gzm55 commented on a diff in the pull request: https://github.com/apache/spark/pull/151#discussion_r10637918 --- Diff: core/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandlerMacro.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37748284 @pwendell I was referring not to the actual implementation, but expectation when using the exposed API. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread gzm55
Github user gzm55 commented on a diff in the pull request: https://github.com/apache/spark/pull/151#discussion_r10637844 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala --- @@ -736,7 +736,7 @@ class JavaPairDStream[K, V](val dstream: DS

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10637809 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd

[GitHub] spark pull request: SPARK-1244: Throw exception if map output stat...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/152#issuecomment-37745299 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13200/ --- If your project

[GitHub] spark pull request: SPARK-1244: Throw exception if map output stat...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/152#issuecomment-37745298 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10637576 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd) {

Re: Code documentation

2014-03-15 Thread Reynold Xin
Take a look at https://cwiki.apache.org/confluence/display/SPARK/Spark+Internals On Sat, Mar 15, 2014 at 6:19 PM, David Thomas wrote: > Is there any documentation available that explains the code architecture > that can help a new Spark framework developer? >

Code documentation

2014-03-15 Thread David Thomas
Is there any documentation available that explains the code architecture that can help a new Spark framework developer?

[GitHub] spark pull request: SPARK-1244: Throw exception if map output stat...

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/152#discussion_r10637523 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -35,13 +35,21 @@ private[spark] case class GetMapOutputStatuses(shuffleId: Int)

[GitHub] spark pull request: SPARK-1244: Throw exception if map output stat...

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/152#discussion_r10637519 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -35,13 +35,21 @@ private[spark] case class GetMapOutputStatuses(shuffleId: Int)

[GitHub] spark pull request: SPARK-1244: Throw exception if map output stat...

2014-03-15 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/152#discussion_r10637484 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -35,13 +35,21 @@ private[spark] case class GetMapOutputStatuses(shuffleId: Int

[GitHub] spark pull request: SPARK-1244: Throw exception if map output stat...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/152#issuecomment-37744245 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: SPARK-1244: Throw exception if map output stat...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/152#issuecomment-37744244 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread andrewor14
Github user andrewor14 closed the pull request at: https://github.com/apache/spark/pull/147 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is e

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/147#issuecomment-37744167 Continued at #152. Closing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: SPARK-1244: Throw exception if map output stat...

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/152#issuecomment-37744154 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: SPARK-1254. Consolidate, order, and harmonize ...

2014-03-15 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/145 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/147#issuecomment-37744049 Hey @andrewor14 I submitted some small changes on top of this while you were working on it over at #152. --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/147#issuecomment-37743968 This should be ready to merge unless other people have more to add. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/147#discussion_r10637306 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -35,13 +35,21 @@ private[spark] case class GetMapOutputStatuses(shuffleId: I

[GitHub] spark pull request: Akka frame

2014-03-15 Thread pwendell
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/152 Akka frame This is a very small change on top of @andrewor14's patch in #147. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark akka

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37742890 @mridulm I think in YARN environments cluster operators can set a logging file on all of the machines to be shared across applications (e.g. Spark, MapReduce, etc). So th

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37742731 @mridulm I think the RDD definition is actually `private[spark]` and it's just intended to be used internally for higher level algorithms. --- If your project is set up

[GitHub] spark pull request: SPARK-1254. Consolidate, order, and harmonize ...

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/145#issuecomment-37742370 https://github.com/sbt/sbt/blob/0.13/ivy/src/main/scala/sbt/Resolver.scala?source=c#L289 --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-1254. Consolidate, order, and harmonize ...

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/145#issuecomment-37742365 Thanks I've merged this. One small change I added is to use `Resolver.mavenLocal` that sbt provides for you instead of hard coding it. --- If your project is set up for

[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37742137 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13199/ --- If your project i

[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37742136 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10637181 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd)

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/149#issuecomment-37738874 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13198/ --- If your project

[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37740255 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/149#issuecomment-37738873 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37740256 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10636602 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd) {

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/149#issuecomment-37737398 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/149#issuecomment-37737399 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/147#issuecomment-37736922 LGMT pending a minor comment about unifying the code path with the Executor thing that reads the frame size. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/147#discussion_r10636474 --- Diff: core/src/main/scala/org/apache/spark/util/AkkaUtils.scala --- @@ -121,4 +121,9 @@ private[spark] object AkkaUtils extends Logging { def lookup

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/147#discussion_r10636463 --- Diff: core/src/main/scala/org/apache/spark/util/AkkaUtils.scala --- @@ -121,4 +121,9 @@ private[spark] object AkkaUtils extends Logging { def lookup

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10636424 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd)

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/151#discussion_r10636411 --- Diff: core/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandlerMacro.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Softwa

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/151#discussion_r10636404 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala --- @@ -736,7 +736,7 @@ class JavaPairDStream[K, V](val dstream:

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37736435 To a step back, given how niche this seems to be and how it violates the "usual" expectations of how our users use spark (lazy execution, etc as mentioned above) - d

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37736391 I am not sure what the intent of this PR is. log config for workers should pretty much mirror what is in master. Also, the hardcoding of the config file, root l

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10636356 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd) {

[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/125#discussion_r10636342 --- Diff: dev/rat.bash --- @@ -0,0 +1,49 @@ +#!/usr/bin/env bash --- End diff -- could you remove the `.bash` extension here? --- If your pr

[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/125#issuecomment-37736072 @ScrapCodes this is a good start but right now it doesn't actually fail the build if RAT doesn't succeed. Also, RAT reports a bunch of failures for python files that I th

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/151#issuecomment-37735857 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/147#issuecomment-37735832 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13196/ --- If your project

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/147#issuecomment-37735830 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37735835 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13197/ --- If your project

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37735834 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-15 Thread gzm55
GitHub user gzm55 opened a pull request: https://github.com/apache/spark/pull/151 fix compile error for hadoop CDH 4.4+ Fix the compilation error when set SPARK_HADOOP_VERSION to 2.0.0-cdh4.4.0, That is, the yarn-alpha project should work with hadoop CDH 4.4.0 and later. You can me

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734634 It is hard to say what threshold to use. I couldn't think of a use case that requires a large window size, but I cannot say there is none. Another possible approach

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734242 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734241 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/147#issuecomment-37734238 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/147#issuecomment-37734239 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734195 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13195/ --- If your project

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37734193 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/147#discussion_r10635968 --- Diff: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala --- @@ -136,4 +142,30 @@ class MapOutputTrackerSuite extends FunSuite with Local

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/147#discussion_r10635967 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -35,13 +35,21 @@ private[spark] case class GetMapOutputStatuses(shuffleId: Int)

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/147#discussion_r10635964 --- Diff: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala --- @@ -136,4 +142,30 @@ class MapOutputTrackerSuite extends FunSuite with LocalSp

[GitHub] spark pull request: [SPARK-1244] Throw exception if map output sta...

2014-03-15 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/147#discussion_r10635962 --- Diff: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala --- @@ -136,4 +123,47 @@ class MapOutputTrackerSuite extends FunSuite with Local

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37733908 Even if it's private we can end up with cases where users have a e.g. 10,000 partition RDD with only a few items in each partition. Do we know a priori when calling this

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37733845 Ah I see - so this isn't going to be externally a user-visible class (I didn't notice it was `private[spark]`)? Would it make sense to throw an assertion error if the sli

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37733755 Seems reasonable to me. You still working on this or is it good to go? --- If your project is set up for it, you can reply to this email and have your reply appear on Git

Re: [re-cont] map and flatMap

2014-03-15 Thread andy petrella
[Thanks a *lot* for your answers!] That's CoOl, a possible example would be to simply write a for-comprehension that would do this: > > val allEvents = for { > deviceId <- rddFromHdfsOfDeviceId > deviceEvent <- rddFromHdfsOfDeviceEvent(deviceId) > } deviceEvent > val hist = computeHistOf(

Re: [re-cont] map and flatMap

2014-03-15 Thread Koert Kuipers
just going head first without any thinking, it changed flatMap to flatMapData and added a flatMap. for FlatMappedRDD my compute is: firstParent[T].iterator(split, context).flatMap(f andThen (_.compute(split, context))) scala> val x = sc.parallelize(1 to 100) scala> x.flatMap _ res0: (Int => org.

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37732906 @pwendell , the limit case is not a practical example. In that case, we need re-partition for most operations to be efficient. Also, this is really for small window sizes l

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635646 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635644 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37732586 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37732587 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37732233 I don't think we typically run jobs inside of getPartitions - so this changes some semantics of calling that function. For instance a lot of the other RDD constructors im

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635557 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [re-cont] map and flatMap

2014-03-15 Thread Koert Kuipers
MappedRDD does: firstParent[T].iterator(split, context).map(f) and FlatMappedRDD: firstParent[T].iterator(split, context).flatMap(f) do yeah seems like its a map or flatMap over the iterator inside, not the RDD itself, sort of... On Sat, Mar 15, 2014 at 9:08 AM, andy petrella wrote: > Yep, > R

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635447 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635444 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/136#discussion_r10635413 --- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/149#discussion_r10635398 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -43,12 +44,13 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd

Re: [re-cont] map and flatMap

2014-03-15 Thread andy petrella
Yep, Regarding flatMap and an implicit parameter might work like in scala's future for instance: https://github.com/scala/scala/blob/master/src/library/scala/concurrent/Future.scala#L246 Dunno, still waiting for some insights from the team ^^ andy On Wed, Mar 12, 2014 at 3:23 PM, Pascal Voitot D

[GitHub] spark pull request: Fix SPARK-1256: Master web UI and Worker web U...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/150#issuecomment-37723748 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: Fix SPARK-1256: Master web UI and Worker web U...

2014-03-15 Thread witgo
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/150 Fix SPARK-1256: Master web UI and Worker web UI returns a 404 error You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-1256 Alterna

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/149#issuecomment-37722634 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/149#issuecomment-37722638 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13193/ --- If your p

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37722637 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13194/ --- If your project

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37722635 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

  1   2   >