[GitHub] spark pull request: [SPARK-2410][SQL] Merging Hive Thrift/JDBC ser...

2014-07-26 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1600#issuecomment-50224484 @marmbrus The build failure was caused by PySpark, please help re-test this, thanks! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-2410][SQL] Merging Hive Thrift/JDBC ser...

2014-07-26 Thread concretevitamin
Github user concretevitamin commented on the pull request: https://github.com/apache/spark/pull/1600#issuecomment-50224519 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2410][SQL] Merging Hive Thrift/JDBC ser...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1600#issuecomment-50224608 QA tests have started for PR 1600. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17217/consoleFull ---

[GitHub] spark pull request: [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1338#issuecomment-50225130 QA tests have started for PR 1338. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17218/consoleFull ---

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50225517 Thanks for commenting. How about the review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2674] [SQL] [PySpark] support datetime ...

2014-07-26 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/1601 [SPARK-2674] [SQL] [PySpark] support datetime type for SchemaRDD Datetime and time in Python will be converted into java.util.Calendar after serialization, it will be converted into

[GitHub] spark pull request: [SPARK-2674] [SQL] [PySpark] support datetime ...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1601#issuecomment-50225682 QA tests have started for PR 1601. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17219/consoleFull ---

[GitHub] spark pull request: [SPARK-2410][SQL] Merging Hive Thrift/JDBC ser...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1600#issuecomment-50225967 QA results for PR 1600:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-2671] BlockObjectWriter should create p...

2014-07-26 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1580#issuecomment-50226383 IMHO, I think it would be better to let it fail when this situation happens, fast fail is better than trying to recover, I think :). --- If your project is set up for

[GitHub] spark pull request: [SPARK-1458] [PySpark] Expose sc.version in Ja...

2014-07-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1596 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2696] Reduce default value of spark.ser...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1595#issuecomment-50226681 Oh sorry, I missed that! Merging it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2652] [PySpark] Turning some default co...

2014-07-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1568 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2563] Make connection retries configura...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1471#issuecomment-50226768 Sure, you can modify the existing one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2674] [SQL] [PySpark] support datetime ...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1601#issuecomment-50228401 QA results for PR 1601:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-1630] Turn Null of Java/Scala into None...

2014-07-26 Thread matei
Github user matei commented on the pull request: https://github.com/apache/spark/pull/1551#issuecomment-50230241 Hi guys, Could you please use the full username (e.g. @mateixx instead if @matei) when referring to someone ? I keep getting subscribed to various conversations under

[GitHub] spark pull request: [SPARK-2671] BlockObjectWriter should create p...

2014-07-26 Thread sarutak
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/1580#issuecomment-50230269 I think, it's sometimes depends on the kinds of error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-2670] FetchFailedException should be th...

2014-07-26 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1578#issuecomment-50232324 Here also should throw an `FetchFailedException `? ```scala override def next(): (BlockId, Option[Iterator[Any]]) = { resultsGotten += 1 val

[GitHub] spark pull request: SPARK-2638 MapOutputTracker concurrency improv...

2014-07-26 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/1542#issuecomment-50236563 @javadba, sorry that maybe my previous comment is not clear. I'm opposed to use `synchronized` on `val monitor = shuffleId.toString.intern`. I see you have to

[GitHub] spark pull request: add Kafka stream feature in according to speci...

2014-07-26 Thread pengyanhong
GitHub user pengyanhong opened a pull request: https://github.com/apache/spark/pull/1602 add Kafka stream feature in according to specified starting offset position to fetch messages create Kafka input DStream to fetch batch messages from specified starting position You can

[GitHub] spark pull request: add Kafka stream feature in according to speci...

2014-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1602#issuecomment-50237001 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: ConnectionManager throws out of Could not fin...

2014-07-26 Thread witgo
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/1603 ConnectionManager throws out of Could not find reference for received ack message xxx exception. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: Build should not run hive tests by default.

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1565#issuecomment-50237468 QA tests have started for PR 1565. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17222/consoleFull ---

[GitHub] spark pull request: ConnectionManager throws out of Could not fin...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1603#issuecomment-50237469 QA tests have started for PR 1603. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17221/consoleFull ---

[GitHub] spark pull request: Build should not run hive tests by default.

2014-07-26 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1565#issuecomment-50237538 @ScrapCodes I think that the solution is simple and effective, is a better. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: ConnectionManager throws out of Could not fin...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1603#issuecomment-50237601 QA results for PR 1603:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: ConnectionManager throws out of Could not fin...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1603#issuecomment-50237979 QA tests have started for PR 1603. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17223/consoleFull ---

[GitHub] spark pull request: [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1338#issuecomment-50238113 QA tests have started for PR 1338. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17224/consoleFull ---

[GitHub] spark pull request: ConnectionManager throws out of Could not fin...

2014-07-26 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1603#issuecomment-50238872 Does the movement of the status modification into the synchronized block change anything? It seems the only effective change here is downgrading an exception to a log

[GitHub] spark pull request: ConnectionManager throws out of Could not fin...

2014-07-26 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1603#issuecomment-50239511 Throw an exception here cause `System.exit(ExecutorExitCode.UNCAUGHT_EXCEPTION)` is called. This is not necessary. --- If your project is set up for it, you can reply

[GitHub] spark pull request: ConnectionManager throws out of Could not fin...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1603#issuecomment-50240022 QA results for PR 1603:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1338#issuecomment-50240323 QA results for PR 1338:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: Build should not run hive tests by default.

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1565#issuecomment-50241933 QA results for PR 1565:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2279] Added emptyRDD method to Java API

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1597#issuecomment-50242097 I've merged this, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2279] Added emptyRDD method to Java API

2014-07-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1597 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-1630] Turn Null of Java/Scala into None...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1551#issuecomment-50242563 QA tests have started for PR 1551. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17225/consoleFull ---

[GitHub] spark pull request: [SPARK-2674] [SQL] [PySpark] support datetime ...

2014-07-26 Thread concretevitamin
Github user concretevitamin commented on a diff in the pull request: https://github.com/apache/spark/pull/1601#discussion_r15434945 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala --- @@ -395,6 +395,11 @@ class SchemaRDD(

[GitHub] spark pull request: [SPARK-1726] [SPARK-2567] Eliminate zombie sta...

2014-07-26 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/1566#issuecomment-50246187 Yeah that seems fine to me -- thanks Matei! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15435454 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -141,6 +193,104 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15435457 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -141,6 +193,104 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: [SPARK-2671] BlockObjectWriter should create p...

2014-07-26 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1580#issuecomment-50246267 Actually we have also seen this happen multiple times. A few have them have been fixed, but not all have been identified. For example, there is incorrect DCL

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15435485 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -141,6 +193,104 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1165#issuecomment-50246395 @andrewor14 I looked through this and it looks good to me. Made a few very small comments to clarify the algorithm. Thanks for adding the test with multiple blocks being

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1604#issuecomment-50246511 @tgravescs I think you created this. Please take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1604 [SPARK-2704] Name threads in ConnectionManager and mark them as daemon. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark daemon

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435513 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435514 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateProjection.scala --- @@ -0,0 +1,218 @@ +/* + *

[GitHub] spark pull request: [SPARK-2674] [SQL] [PySpark] support datetime ...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1601#discussion_r15435519 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -357,16 +357,52 @@ class SQLContext(@transient val sparkContext: SparkContext)

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1604#issuecomment-50246561 QA tests have started for PR 1604. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17226/consoleFull ---

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435527 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435534 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435573 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratedEvaluationSuite.scala --- @@ -0,0 +1,108 @@ +/* + *

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435619 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -51,8 +82,46 @@ abstract class SparkPlan extends QueryPlan[SparkPlan]

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15435630 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: PEP8 compliance

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1540#issuecomment-50247285 @bigsnarfdude Do you mind closing this pull request? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/993#issuecomment-50247318 QA tests have started for PR 993. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17227/consoleFull ---

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-26 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/993#issuecomment-50247741 Hey @rxin, thanks for the careful review! I think I've addressed most of your comments. Regarding the GeneratedAggregate code, I'm happy to sit down and explain in

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1604#issuecomment-50247796 QA results for PR 1604:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2547]:The clustering documentaion examp...

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1590#issuecomment-50248537 I've merged this. Thanks! Do you mind closing this pull request, since it doesn't look like GitHub will do it automatically? --- If your project is set up

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1439#discussion_r15435898 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -241,4 +251,37 @@ private[hive] object HadoopTableReader { val

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-26 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-50248847 Also, can you delete `[WIP]` from the PR title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15436031 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -738,6 +771,8 @@ private[spark] class TaskSetManager( /**

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50249494 Hey Nan, sorry for the delay in getting to this. IMO this design is still too complicated -- we are passing so much state to resourceOffer and it's not super clear how

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1604#issuecomment-50249860 Looks good, merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2704] Name threads in ConnectionManager...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1604#issuecomment-50249875 Oh sorry, I didn't see you wanted Tom to take a look at it too. Would be good to get his feedback. I just looked at the patch... --- If your project is set up for it,

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-50250085 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1361#discussion_r15436152 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingRegression.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1388#issuecomment-50250159 @sryza mind adding this in Python as well? I think you need to add a `def __str__(self)` on the LinearModel class in mllib/regression.py. --- If your project is set up

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15436168 --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPageRank.scala --- @@ -51,6 +55,11 @@ object SparkPageRank { urls.map(url =

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250201 QA tests have started for PR 1418. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17228/consoleFull ---

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-50250198 QA tests have started for PR 1358. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17229/consoleFull ---

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250223 Hey @viirya, instead of modifying the PageRank example, what do you think of leaving it as-is until we have automatic checkpointing of long lineage chains? I think that

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250213 QA results for PR 1418:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250242 BTW the Jenkins failure is due to a code style issue: an if block without braces Jenkins, this is ok to test --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15436180 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -275,18 +286,48 @@ class DAGScheduler( case shufDep:

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50250282 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50250276 BTW you can run sbt scalastyle to check these style things locally --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50250292 Chris, is this new code from scratch, or is it based on Parviz's old pull request? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50250374 QA tests have started for PR 1434. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17230/consoleFull ---

[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50250447 When did this come up? I'm actually not sure this is a good behavior, because doing this means that a user might completely miss a misconfigured directory. With the

[GitHub] spark pull request: [SPARK-2601] [PySpark] Fix Py4J error when tra...

2014-07-26 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/1605 [SPARK-2601] [PySpark] Fix Py4J error when transforming pickleFiles Similar to SPARK-1034, the problem was that Py4J didn’t cope well with the fake ClassTags used in the Java API. It doesn’t

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-07-26 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1358#discussion_r15436238 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -250,7 +252,7 @@ private[spark] class

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-50250554 So I don't quite understand, how can multiple executors be launched for the same Spark application on the same node right now? I thought we always reuse our executor

[GitHub] spark pull request: [SPARK-2601] [PySpark] Fix Py4J error when tra...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1605#issuecomment-50250570 QA tests have started for PR 1605. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17231/consoleFull ---

[GitHub] spark pull request: [SPARK-2547]:The clustering documentaion examp...

2014-07-26 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1590#issuecomment-50250978 Thank you, merged it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2601] [PySpark] Fix Py4J error when tra...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1605#issuecomment-50251333 QA results for PR 1605:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-1550] [PySpark] Allow SparkContext crea...

2014-07-26 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/1606 [SPARK-1550] [PySpark] Allow SparkContext creation after failed attempts This addresses a PySpark issue where a failed attempt to construct SparkContext would prevent any future SparkContext

[GitHub] spark pull request: [SPARK-1550] [PySpark] Allow SparkContext crea...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1606#issuecomment-50252309 QA tests have started for PR 1606. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17232/consoleFull ---

[GitHub] spark pull request: [SPARK-1550] [PySpark] Allow SparkContext crea...

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1606#discussion_r15436589 --- Diff: python/pyspark/context.py --- @@ -249,17 +258,14 @@ def defaultMinPartitions(self): return

[GitHub] spark pull request: [SPARK-2601] [PySpark] Fix Py4J error when tra...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1605#issuecomment-50252365 Thanks Josh; merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1082#issuecomment-50252530 What's the use-case for just retrieving only the ids of the persistent RDDs? If you want to check whether a particular RDD has been persisted, you can use the

[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

2014-07-26 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1082#discussion_r15436718 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala --- @@ -559,6 +559,19 @@ class JavaSparkContext(val sc: SparkContext)

[GitHub] spark pull request: PEP8 compliance

2014-07-26 Thread bigsnarfdude
Github user bigsnarfdude closed the pull request at: https://github.com/apache/spark/pull/1540 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1550] [PySpark] Allow SparkContext crea...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1606#issuecomment-50252948 QA results for PR 1606:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50253029 Hi, @mateiz , thanks for the comments If we just adding NO_PREF level, it can avoid the unnecessary waiting when we only have no-pref tasks, however,

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50253135 QA tests have started for PR 1313. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17233/consoleFull ---

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-26 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15436880 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -738,6 +771,8 @@ private[spark] class TaskSetManager( /**

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread mateiz
GitHub user mateiz opened a pull request: https://github.com/apache/spark/pull/1607 SPARK-2684: Update ExternalAppendOnlyMap to take an iterator as input This will decrease object allocation from the update closure used in map.changeValue. You can merge this pull request into a

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1607#issuecomment-50253396 QA tests have started for PR 1607. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17235/consoleFull ---

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread cfregly
Github user cfregly commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50253400 @mateiz - this is a completely brand-new, from-scratch implementation. parviz's old code was actually a Scala port of the Java-based Kinesis sample

[GitHub] spark pull request: [SPARK-1981] Add AWS Kinesis streaming support

2014-07-26 Thread cfregly
Github user cfregly commented on the pull request: https://github.com/apache/spark/pull/1434#issuecomment-50253422 also, can someone address the questions i have here regarding the ec2 scripts and other peripheral aspects of this PR:

  1   2   >