[GitHub] spark pull request: [SPARK-2467] Revert SparkBuild to publish-loca...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1398#issuecomment-48869958 QA tests have started for PR 1398. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16616/consoleFull --- If

[GitHub] spark pull request: [SPARK-2410][SQL] Cherry picked Hive Thrift/JD...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-48869966 QA results for PR 1399:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):class HiveThriftServer2(hiv

[GitHub] spark pull request: [SPARK-2410][SQL] Cherry picked Hive Thrift/JD...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-48869960 QA tests have started for PR 1399. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16615/consoleFull --- If

[GitHub] spark pull request: [SPARK-1946] Submit tasks after (configured ra...

2014-07-13 Thread li-zhihui
Github user li-zhihui commented on the pull request: https://github.com/apache/spark/pull/900#issuecomment-48869936 @tgravescs add a commit according to comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: [SPARK-2410][SQL] Cherry picked Hive Thrift/JD...

2014-07-13 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/1399 [SPARK-2410][SQL] Cherry picked Hive Thrift/JDBC server JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410) Cherry picked the Hive Thrift/JDBC server from [branch-1

[GitHub] spark pull request: [SPARK-2467] Revert SparkBuild to publish-loca...

2014-07-13 Thread ueshin
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1398 [SPARK-2467] Revert SparkBuild to publish-local to both .m2 and .ivy2. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues

[GitHub] spark pull request: remove not used test in src/main

2014-07-13 Thread adrian-wang
Github user adrian-wang closed the pull request at: https://github.com/apache/spark/pull/1397 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: remove not used test in src/main

2014-07-13 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/1397#issuecomment-48869638 OK, I'll close this. Thank you Reynold! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: remove not used test in src/main

2014-07-13 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1397#issuecomment-48869513 That one already has a main method. Perhaps best to leave this here since it just starts a connection manager. --- If your project is set up for it, you can reply to this e

[GitHub] spark pull request: remove not used test in src/main

2014-07-13 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/1397#issuecomment-48867685 There's an object in `ConnectionManagerTest.scala` in this package, maybe move these codes there? --- If your project is set up for it, you can reply to this email a

[GitHub] spark pull request: remove not used test in src/main

2014-07-13 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1397#issuecomment-48867494 Maybe a better change would be just adding some inline comment explaining they are used for benchmarks. --- If your project is set up for it, you can reply to this email an

[GitHub] spark pull request: remove not used test in src/main

2014-07-13 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1397#issuecomment-48867486 I think those are used to do manual testing for performance benchmarks, so probably best to leave them there. --- If your project is set up for it, you can reply to this em

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-48867225 Hi @sryza, I left a couple of comments. In general, I think this patch can be simplified by using akka only for the driver-executor heartbeats. We should also clarify

[GitHub] spark pull request: SPARK-1536: multiclass classification support ...

2014-07-13 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/886#issuecomment-48867199 Thanks Evan. I have compared to scikit-learn on the covertype dataset and the results looked similar. --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request: SPARK-1536: multiclass classification support ...

2014-07-13 Thread manishamde
Github user manishamde commented on a diff in the pull request: https://github.com/apache/spark/pull/886#discussion_r14865144 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -768,104 +973,157 @@ object DecisionTree extends Serializable with Log

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14865082 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -341,4 +345,47 @@ private[spark] class Executor( } }

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14865059 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -197,33 +189,71 @@ class JobProgressListener(conf: SparkConf) exten

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864999 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -197,33 +189,71 @@ class JobProgressListener(conf: SparkConf) exten

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864962 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -230,6 +229,10 @@ class BlockManagerMasterActor(val isLocal: Bo

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864951 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864946 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864917 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -129,7 +128,7 @@ class BlockManagerMasterActor(val isLocal: Boo

[GitHub] spark pull request: [SPARK-2460] Optimize SparkContext.hadoopFile ...

2014-07-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1385#discussion_r14864765 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -552,17 +552,10 @@ class SparkContext(config: SparkConf) extends Logging { va

[GitHub] spark pull request: [SPARK-2460] Optimize SparkContext.hadoopFile ...

2014-07-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1385#discussion_r14864755 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -206,17 +202,10 @@ class HadoopTableReader(@transient _tableDesc: TableDesc

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864751 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -52,25 +52,24 @@ class BlockManagerMasterActor(val isLocal: Boo

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864612 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -158,6 +161,11 @@ trait SparkListener { * Called when the appli

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864590 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -81,13 +81,16 @@ private[spark] class EventLoggingListener(

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864551 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -37,8 +36,15 @@ import org.apache.spark._ import org.apache.spark.e

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864526 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -341,4 +345,47 @@ private[spark] class Executor( } }

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864470 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -341,4 +345,47 @@ private[spark] class Executor( } }

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864464 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorBackend.scala --- @@ -20,6 +20,7 @@ package org.apache.spark.executor import java.nio.

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864449 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -341,4 +345,47 @@ private[spark] class Executor( } }

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864427 --- Diff: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala --- @@ -32,6 +32,9 @@ import org.apache.spark.deploy.worker.Wor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r14864424 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-48865504 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: remove not used test in src/main

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1397#issuecomment-48864486 QA results for PR 1397:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1387#issuecomment-48863186 QA results for PR 1387:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-07-13 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-48863082 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

2014-07-13 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-48863087 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14863059 --- Diff: docs/mllib-linear-methods.md --- @@ -242,7 +242,96 @@ Similarly, you can use replace `SVMWithSGD` by All of MLlib's methods use Java-friendly typ

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14863048 --- Diff: docs/mllib-optimization.md --- @@ -263,7 +267,110 @@ println("Loss of each step in training process") loss.foreach(println) println("Area un

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14863037 --- Diff: docs/mllib-optimization.md --- @@ -263,7 +267,110 @@ println("Loss of each step in training process") loss.foreach(println) println("Area un

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14863023 --- Diff: docs/mllib-optimization.md --- @@ -263,7 +267,110 @@ println("Loss of each step in training process") loss.foreach(println) println("Area un

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1311#issuecomment-48862778 @miccagiann I made one pass through the example code. Besides inline comments: 1. We moved mllib's data to `data/mllib` in #1394 . Could you please update the pat

[GitHub] spark pull request: [SPARK-2460] Optimize SparkContext.hadoopFile ...

2014-07-13 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1385#discussion_r14862987 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -552,17 +552,10 @@ class SparkContext(config: SparkConf) extends Logging {

[GitHub] spark pull request: [SPARK-2460] Optimize SparkContext.hadoopFile ...

2014-07-13 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1385#discussion_r14862963 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -128,25 +123,13 @@ class HadoopRDD[K, V]( // Returns a JobConf that wil

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862952 --- Diff: docs/mllib-optimization.md --- @@ -263,7 +267,110 @@ println("Loss of each step in training process") loss.foreach(println) println("Area un

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862947 --- Diff: docs/mllib-linear-methods.md --- @@ -338,7 +427,74 @@ and [`LassoWithSGD`](api/scala/index.html#org.apache.spark.mllib.regression.Lass All of ML

[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...

2014-07-13 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1390#discussion_r14862941 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc:

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862939 --- Diff: docs/mllib-linear-methods.md --- @@ -338,7 +427,74 @@ and [`LassoWithSGD`](api/scala/index.html#org.apache.spark.mllib.regression.Lass All of ML

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862917 --- Diff: docs/mllib-dimensionality-reduction.md --- @@ -57,10 +57,57 @@ val U: RowMatrix = svd.U // The U factor is a RowMatrix. val s: Vector = svd.s //

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862920 --- Diff: docs/mllib-linear-methods.md --- @@ -242,7 +242,96 @@ Similarly, you can use replace `SVMWithSGD` by All of MLlib's methods use Java-friendly typ

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862918 --- Diff: docs/mllib-dimensionality-reduction.md --- @@ -91,4 +138,51 @@ val pc: Matrix = mat.computePrincipalComponents(10) // Principal components are v

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862921 --- Diff: docs/mllib-linear-methods.md --- @@ -242,7 +242,96 @@ Similarly, you can use replace `SVMWithSGD` by All of MLlib's methods use Java-friendly typ

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862916 --- Diff: docs/mllib-collaborative-filtering.md --- @@ -99,7 +99,88 @@ val model = ALS.trainImplicit(ratings, rank, numIterations, alpha) All of MLlib's m

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862914 --- Diff: docs/mllib-collaborative-filtering.md --- @@ -99,7 +99,88 @@ val model = ALS.trainImplicit(ratings, rank, numIterations, alpha) All of MLlib's m

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862915 --- Diff: docs/mllib-collaborative-filtering.md --- @@ -99,7 +99,88 @@ val model = ALS.trainImplicit(ratings, rank, numIterations, alpha) All of MLlib's m

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862912 --- Diff: docs/mllib-collaborative-filtering.md --- @@ -99,7 +99,88 @@ val model = ALS.trainImplicit(ratings, rank, numIterations, alpha) All of MLlib's m

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862910 --- Diff: docs/mllib-collaborative-filtering.md --- @@ -99,7 +99,88 @@ val model = ALS.trainImplicit(ratings, rank, numIterations, alpha) All of MLlib's m

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862911 --- Diff: docs/mllib-collaborative-filtering.md --- @@ -99,7 +99,88 @@ val model = ALS.trainImplicit(ratings, rank, numIterations, alpha) All of MLlib's m

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862909 --- Diff: docs/mllib-clustering.md --- @@ -69,7 +69,54 @@ println("Within Set Sum of Squared Errors = " + WSSSE) All of MLlib's methods use Java-friendly t

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862906 --- Diff: docs/mllib-clustering.md --- @@ -69,7 +69,54 @@ println("Within Set Sum of Squared Errors = " + WSSSE) All of MLlib's methods use Java-friendly t

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1311#discussion_r14862907 --- Diff: docs/mllib-clustering.md --- @@ -69,7 +69,54 @@ println("Within Set Sum of Squared Errors = " + WSSSE) All of MLlib's methods use Java-friendly t

[GitHub] spark pull request: [SPARK-2460] Optimize SparkContext.hadoopFile ...

2014-07-13 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/1385#issuecomment-48861711 @rxin and @aarondav, yeah ,the master branch deadlocks, it seems locks of #1273 and Hadoop-10456 lead to the problem. when run hivesql self join sql--- hql("SELECT t1.a, t

[GitHub] spark pull request: remove not used test in src/main

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1397#issuecomment-48861087 QA tests have started for PR 1397. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16612/consoleFull --- If

[GitHub] spark pull request: remove not used test in src/main

2014-07-13 Thread adrian-wang
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/1397 remove not used test in src/main Maybe I should put that back in some test suite? You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang

[GitHub] spark pull request: [SQL][CORE] SPARK-2102

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1377#issuecomment-48860618 QA results for PR 1377:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: SPARK-2363. Clean MLlib's sample data files

2014-07-13 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1394 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...

2014-07-13 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1390#issuecomment-48860420 In general, I suggest adding more comments to explain what we are doing at here because this part of code is pretty Hive-specific. --- If your project is set up for it, yo

[GitHub] spark pull request: SPARK-2363. Clean MLlib's sample data files

2014-07-13 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1394#issuecomment-48860407 @srowen This looks good to me and thank you for updating the docs as well! --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...

2014-07-13 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1390#discussion_r14862338 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc: TableDes

[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...

2014-07-13 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1390#discussion_r14862300 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc: TableDes

[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...

2014-07-13 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1390#discussion_r14862289 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -157,21 +161,60 @@ class HadoopTableReader(@transient _tableDesc: TableDes

[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...

2014-07-13 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1390#issuecomment-48860018 @chenghao-intel I am not sure I understand your comment on column pruning. I think for a Hive table, we should use `ColumnProjectionUtils` to set needed columns. So, RCFile

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-13 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-48859854 This looks good to me :) Just a reminder that when TaskSchedulerImpl calls TaskSetManager.resourceOffer, the maxLocality (changed to preferredLocality in this PR)

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1387#issuecomment-48859861 QA tests have started for PR 1387. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16611/consoleFull --- If

[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...

2014-07-13 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1390#issuecomment-48859842 And as the Hive SerDe actually provides the feature of `lazy` parsing, hence during the converting of `raw object` to `Row`, we need to support the column pruning

[GitHub] spark pull request: [SPARK-2443][SQL] Fix slow read from partition...

2014-07-13 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1390#issuecomment-48859675 The code looks good to me. However, I think we can avoid the work around solution (de-serializing (with partition serde) and then serialize (with table serde) agai

[GitHub] spark pull request: [SPARK-2317] Improve task logging.

2014-07-13 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1259#issuecomment-48859674 If we actually want people to get information out of all those numbers, can we consider using a human readable format such as `Task(stageId = 1, taskId = 5, attempt = 0)

[GitHub] spark pull request: [SPARK-2125] Add sort flag and move sort into ...

2014-07-13 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1210#issuecomment-48859519 Hi Matei, thanks a lot for your review, I will change the code according to your comments. --- If your project is set up for it, you can reply to this email and have y

[GitHub] spark pull request: [SQL][CORE] SPARK-2102

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1377#issuecomment-48857093 QA tests have started for PR 1377. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16610/consoleFull --- If

[GitHub] spark pull request: [SQL][CORE] SPARK-2102

2014-07-13 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1377#issuecomment-48857036 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this featu

[GitHub] spark pull request: [SPARK-1945][MLLIB] Documentation Improvements...

2014-07-13 Thread miccagiann
Github user miccagiann commented on the pull request: https://github.com/apache/spark/pull/1311#issuecomment-48855958 Hello guys, I have provided Java examples for the following documentation files: mllib-clustering.md mllib-collaborative-filtering.md mllib-dimensio

[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1392#issuecomment-48855862 QA results for PR 1392:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1392#issuecomment-48854002 QA tests have started for PR 1392. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16609/consoleFull --- If

[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...

2014-07-13 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1392#issuecomment-48853888 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SQL] Whitelist more Hive tests.

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1396#issuecomment-48853489 QA results for PR 1396:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48852700 QA results for PR 1393:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):case class Rating(user: Lon

[GitHub] spark pull request: [SQL] Whitelist more Hive tests.

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1396#issuecomment-48851429 QA tests have started for PR 1396. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16608/consoleFull --- If

[GitHub] spark pull request: [SQL] Whitelist more Hive tests.

2014-07-13 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/1396 [SQL] Whitelist more Hive tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark moreTests Alternatively you can review and app

[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs

2014-07-13 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1351#issuecomment-48851059 Note that there are multiple problems. We can solve the problem of out of memory by simply limiting the length of a record. Ideally, csvRDD(RDD[String]) should just be one e

[GitHub] spark pull request: [SPARK-546] Add full outer join to RDD and DSt...

2014-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1395#issuecomment-48851025 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: [SPARK-546] Add full outer join to RDD and DSt...

2014-07-13 Thread staple
GitHub user staple opened a pull request: https://github.com/apache/spark/pull/1395 [SPARK-546] Add full outer join to RDD and DStream. You can merge this pull request into a Git repository by running: $ git pull https://github.com/staple/spark SPARK-546 Alternatively you can

[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs

2014-07-13 Thread falaki
Github user falaki commented on the pull request: https://github.com/apache/spark/pull/1351#issuecomment-48850882 This is not a bad idea, especially considering that a file can be split across partitions. @marmbrus you suggested this feature. What do you think about Reynold's suggesti

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48850704 QA tests have started for PR 1393. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16607/consoleFull --- If

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1360#issuecomment-48850708 QA results for PR 1360:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: SPARK-2363. Clean MLlib's sample data files

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1394#issuecomment-48849070 QA results for PR 1394:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-48849034 Hi @CodingCat looks good to me. My only doubt, which we discussed last, was whether we want to differentiate between tasks which have no locations at all vs tasks whic

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48848503 QA results for PR 1393:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):case class Rating(user: Lon

[GitHub] spark pull request: [SPARK-2317] Improve task logging.

2014-07-13 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1259#issuecomment-48848292 Hi @rxin, I took a pass over the patch and the changes mostly look good. On a higher level point, I notice that we log this pattern `0.0:4.0 (TID 4 ...)` quite often,

  1   2   >