[GitHub] spark pull request: [SPARK-2572] Delete the local dir on executor ...

2014-07-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1480#issuecomment-50442106 But why is that? The JVM should always call shutdown hooks when it exists. Is Mesos killing the process? I'm curious because we might have other behavior that depe

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508428 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T]) { ne

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-50442021 This looks a lot better, thanks. Still made a few comments throughout it. I think we can get rid of the fine-grained tracking of which nodes have node-only tasks, that is

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508376 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -341,20 +346,31 @@ private[spark] class TaskSetManager( *

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508357 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -80,7 +80,7 @@ class FakeTaskSetManager( override def resou

[GitHub] spark pull request: Example pyspark-inputformat for Avro file form...

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1536#discussion_r15508369 --- Diff: examples/src/main/scala/org/apache/spark/examples/pythonconverters/AvroGenericConverter.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to th

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508347 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -751,20 +787,7 @@ private[spark] class TaskSetManager( levels.toA

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508331 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T]) { ne

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508334 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T]) { ne

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508338 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -435,6 +460,13 @@ private[spark] class TaskSetManager(

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508323 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -363,38 +379,44 @@ private[spark] class TaskSetManager( }

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508307 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -341,20 +346,31 @@ private[spark] class TaskSetManager( *

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508295 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T]) { ne

[GitHub] spark pull request: Example pyspark-inputformat for Avro file form...

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1536#discussion_r15508267 --- Diff: examples/src/main/scala/org/apache/spark/examples/pythonconverters/AvroGenericConverter.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to th

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508258 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -113,6 +114,10 @@ private[spark] class TaskSetManager( // but at ho

[GitHub] spark pull request: Fix maven test bug

2014-07-28 Thread hzw19900416
Github user hzw19900416 closed the pull request at: https://github.com/apache/spark/pull/1529 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2580] [PySpark] keep silent in worker i...

2014-07-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1625#issuecomment-50441519 @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enable

[GitHub] spark pull request: [SPARK-2677] BasicBlockFetchIterator#next can ...

2014-07-28 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1632#discussion_r15508224 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -117,31 +121,45 @@ object BlockFetcherIterator { })

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508220 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -246,28 +246,36 @@ private[spark] class TaskSchedulerImpl(

[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-50441485 @dbtsai I thought another way to do this and want to know your opinion. We can add an optional argument to `appendBias`: `appendBias(bias: Double = 1.0)`. If this is used

[GitHub] spark pull request: Fix maven test bug

2014-07-28 Thread hzw19900416
Github user hzw19900416 commented on the pull request: https://github.com/apache/spark/pull/1529#issuecomment-50441495 This error is due to the environment of mine. So close it. In addition, using the "mvn package" to do the unit test while compiling is better than using "mvn test"

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508212 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -246,28 +246,36 @@ private[spark] class TaskSchedulerImpl(

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1313#discussion_r15508186 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocality.scala --- @@ -22,7 +22,7 @@ import org.apache.spark.annotation.DeveloperApi @Develop

[GitHub] spark pull request: [SPARK-2726] and [SPARK-2727] Remove SortOrder...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1631#issuecomment-50441457 QA tests have started for PR 1631. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17342/consoleFull --- If

[GitHub] spark pull request: fix a mistaken type of "if" in description of ...

2014-07-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1633#issuecomment-50441359 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: [SPARK-2174][MLLIB] treeReduce and treeAggrega...

2014-07-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1110#discussion_r15508170 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala --- @@ -44,6 +47,65 @@ class RDDFunctions[T: ClassTag](self: RDD[T]) { ne

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-07-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-50441305 Ah, I see. In that case I'd prefer something like this: ``` MLlibUtils.registerKryoClasses(conf) GraphXUtils.registerKryoClasses(conf) conf.registerKryoClasse

[GitHub] spark pull request: fix a mistaken type of "if" in description of ...

2014-07-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1633#issuecomment-50441322 Thanks for submitting the pull request. I don't think it is a typo. "iff" means "if and only if". --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-2726] and [SPARK-2727] Remove SortOrder...

2014-07-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1631#issuecomment-50441277 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request: fix a mistaken type of "if" in description of ...

2014-07-28 Thread hzw19900416
GitHub user hzw19900416 opened a pull request: https://github.com/apache/spark/pull/1633 fix a mistaken type of "if" in description of trait Partitioning You can merge this pull request into a Git repository by running: $ git pull https://github.com/hzw19900416/spark CoderRead

[GitHub] spark pull request: Minor indentation and comment typo fixes.

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1630#issuecomment-50441147 QA tests have started for PR 1630. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17341/consoleFull --- If

[GitHub] spark pull request: Added SparkGCE Script for Version 0.9.1

2014-07-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/681#issuecomment-50441094 Libcloud looks good actually, and it's nice that it's another Apache project. Would be worth a try if you guys want to investigate it. It would be awesome if we also get Op

[GitHub] spark pull request: [SPARK-2726] and [SPARK-2727] Remove SortOrder...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1631#issuecomment-50441055 QA results for PR 1631:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):class SparkSQLOperationMana

[GitHub] spark pull request: SPARK-2045 Sort-based shuffle

2014-07-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15508024 --- Diff: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala --- @@ -43,10 +44,10 @@ import org.apache.spark.{Logging, RangePartitioner}

[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/554#issuecomment-50440910 Hi @kalpit, Since this PR has been superseded by #644, do you mind closing it? Thanks! --- If your project is set up for it, you can reply to this email and h

[GitHub] spark pull request: [yarn] delete useless variables

2014-07-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1614#issuecomment-50440880 @XuTingjun mind creating a JIRA issue on https://issues.apache.org/jira/browse/SPARK so we can track this? When you do, update the pull request's title with the JIRA numbe

[GitHub] spark pull request: Minor indentation and comment typo fixes.

2014-07-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1630#issuecomment-50440788 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2677] BasicBlockFetchIterator#next can ...

2014-07-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1632#issuecomment-50440763 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: [SPARK-2677] BasicBlockFetchIterator#next can ...

2014-07-28 Thread sarutak
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/1632 [SPARK-2677] BasicBlockFetchIterator#next can wait forever You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark SPARK-2677 Alternative

[GitHub] spark pull request: Let pyspark execute files even when IPYTHON=1

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/515#issuecomment-50440531 If you can figure out a way to retain backwards-compatibility with IPython < 2, I'd be happy to merge this. Maybe you can do something like parsing `ipython --version`

[GitHub] spark pull request: [SPARK-1550] Fixed - Successive creation of sp...

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/478#issuecomment-50440084 Hi @prabinb, Thanks for submitting this PR. This issue has been fixed by #1606, so do you mind closing this? Thanks! --- If your project is set up for it, yo

[GitHub] spark pull request: [SPARK-1812][wip]

2014-07-28 Thread avati
Github user avati commented on the pull request: https://github.com/apache/spark/pull/996#issuecomment-50439713 @ScrapCodes @mateiz looks like there is some parallel efforts here (github.com/avati/spark/commits/scala-2.11). It is true some upstream artifacts are pending (from other pr

[GitHub] spark pull request: [SPARK-2583] ConnectionManager cannot distingu...

2014-07-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1490#issuecomment-50439564 I looked at this (with some confusion). Yes, I agree it would be great to just signal failure using the promise when an error occurs. @sarutak do you think you can d

[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread javadba
Github user javadba commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50439423 @ueshin The length applies to any datatype - i described in a prior comment. AFA getBytes, I am following the recommendation of @chenghao-intel : I thi

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1624#issuecomment-50438015 QA results for PR 1624:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread ueshin
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50437674 I'm sorry but now I become confused. `Length` and `Strlen` look like becoming almost the same implementation. What do you intend the difference between them is?

[GitHub] spark pull request: Example pyspark-inputformat for Avro file form...

2014-07-28 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1536#discussion_r15507202 --- Diff: examples/src/main/scala/org/apache/spark/examples/pythonconverters/AvroGenericConverter.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to th

[GitHub] spark pull request: [SPARK-2726] and [SPARK-2727] Remove SortOrder...

2014-07-28 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1631#issuecomment-50437417 Cool, much cleaner than the previous code, looks good to me :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-50437366 QA tests have started for PR 1309. This patch DID NOT merge cleanly! View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17340/consoleFull

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-07-28 Thread bgreeven
Github user bgreeven commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-50436968 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-50436940 QA results for PR 1309:- This patch FAILED unit tests.For more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17339/consol

[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-50436890 QA tests have started for PR 1309. This patch DID NOT merge cleanly! View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17339/consoleFull

[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-50436641 QA results for PR 1309:- This patch FAILED unit tests.For more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17338/consol

[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-50436636 QA tests have started for PR 1309. This patch DID NOT merge cleanly! View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17338/consoleFull

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1498#issuecomment-50435813 QA results for PR 1498:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):abstract class Dependency[T

[GitHub] spark pull request: [SPARK-2726] and [SPARK-2727] Remove SortOrder...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1631#issuecomment-50435646 QA tests have started for PR 1631. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17335/consoleFull --- If

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1624#issuecomment-50435652 QA tests have started for PR 1624. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17336/consoleFull --- If

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1624#discussion_r15506490 --- Diff: python/pyspark/mllib/regression.py --- @@ -120,6 +120,23 @@ def train(cls, data, iterations=100, step=1.0, d._jrdd, iterations, step,

[GitHub] spark pull request: [SPARK-2726] and [SPARK-2727] Remove SortOrder...

2014-07-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1631#issuecomment-50435549 @jerryshao @mateiz --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this featur

[GitHub] spark pull request: [SPARK-2726] and [SPARK-2727] Remove SortOrder...

2014-07-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1631#issuecomment-50435539 The diff is screwed up because I'm basing the pull request off the real ASF master. Diff here: https://github.com/rxin/spark/commit/c9d37e1bacaff2be9ee9174a2965fdc2

[GitHub] spark pull request: [SPARK-2726] and [SPARK-2727] Remove SortOrder...

2014-07-28 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1631 [SPARK-2726] and [SPARK-2727] Remove SortOrder and do in-place sort. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark sortOrder Alternat

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1624#issuecomment-50435465 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1612#issuecomment-50435389 Actually I am not sure the motivation of this PR, seems fix the bug in line 109 of `JdbcRDD.scala` is enough. Creating the SchemaRDD from a normal RDD is easy w

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-50435407 Hey @sryza I think the overall architecture here is good, but I did a pass with various comments. I do have a few questions throughout that are about nontrivial things,

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506372 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -1,5 +1,4 @@ /* - * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506371 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -341,24 +339,6 @@ class BlockManagerSuite extends FunSuite with Matcher

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506267 --- Diff: docs/configuration.md --- @@ -524,6 +524,13 @@ Apart from these, the following properties are also available, and may be useful output dir

[GitHub] spark pull request: [WIP] [SPARK-2010] [PySpark] [SQL] support nes...

2014-07-28 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1598#issuecomment-50435088 A StructType is presented as an namedtuple in Python, which is called Row. The Row is generated according schema, there is no predefined Row class, so it's better

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1628#issuecomment-50435025 LGTM except minor inline comments. For the file name, it should be possible to have a package named `random`, for example, `numpy.random`: http://docs.scipy.org/doc/numpy/

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r15506243 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -81,8 +105,10 @@ class JdbcRDD[T: ClassTag]( logInfo("statement fetch

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506235 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -155,6 +156,23 @@ class DAGScheduler( eventProcessActor ! Complet

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r15506119 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -67,6 +69,28 @@ class JdbcRDD[T: ClassTag]( }).toArray }

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506106 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -23,6 +23,14 @@ import org.apache.spark.storage.{BlockId, BlockStatus}

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506095 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor( } }

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r15506092 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -67,6 +69,28 @@ class JdbcRDD[T: ClassTag]( }).toArray }

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r15506096 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -67,6 +69,28 @@ class JdbcRDD[T: ClassTag]( }).toArray }

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506065 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -48,6 +48,8 @@ private[spark] class Executor( private val EMPTY_BYT

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506056 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor( } }

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506045 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor( } }

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506031 --- Diff: python/pyspark/mllib/randomRDD.py --- @@ -0,0 +1,213 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contri

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r15506043 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -81,8 +105,10 @@ class JdbcRDD[T: ClassTag]( logInfo("statement fetch

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506030 --- Diff: python/pyspark/mllib/randomRDD.py --- @@ -0,0 +1,213 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contri

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506026 --- Diff: python/pyspark/mllib/randomRDD.py --- @@ -0,0 +1,213 @@ +# --- End diff -- Should the file name match Scala's? --- If your project

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506022 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -453,4 +454,74 @@ class PythonMLLibAPI extends Serializable {

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506028 --- Diff: python/pyspark/mllib/randomRDD.py --- @@ -0,0 +1,213 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contri

[GitHub] spark pull request: [SPARK-2724] Python version of RandomRDDGenera...

2014-07-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1628#discussion_r15506024 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/random/RandomRDDGenerators.scala --- @@ -35,6 +35,9 @@ object RandomRDDGenerators { * :: Experime

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15506011 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -186,6 +191,7 @@ private[spark] class Executor( // Run the ac

[GitHub] spark pull request: Minor indentation and comment typo fixes.

2014-07-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1630#issuecomment-50434173 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15505852 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -186,6 +191,7 @@ private[spark] class Executor( // Run the ac

[GitHub] spark pull request: Minor indentation and comment typo fixes.

2014-07-28 Thread staple
GitHub user staple opened a pull request: https://github.com/apache/spark/pull/1630 Minor indentation and comment typo fixes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/staple/spark minor Alternatively you can review and ap

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r15505789 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDD.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-07-28 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r15505774 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDD.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-28 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15505792 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -186,6 +191,7 @@ private[spark] class Executor( // Run the ac

[GitHub] spark pull request: [WIP] [SPARK-2010] [PySpark] [SQL] support nes...

2014-07-28 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/1598#issuecomment-50433935 With this PR, what does a `StructType` represent? namedtuple or array? Do we still keep the Row class in PySpark? --- If your project is set up for it, you can reply to th

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1498#issuecomment-50433817 QA tests have started for PR 1498. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17334/consoleFull --- If

[GitHub] spark pull request: Excess judgment

2014-07-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1629#issuecomment-50433822 Merging this in master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-28 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1498#issuecomment-50433805 Ok I pushed a new version that also broadcasts the final task closure as well as the shuffle dependency. This one should be good to go (pending Jenkins happiness). --- If

[GitHub] spark pull request: Excess judgment

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1629#issuecomment-50433748 QA results for PR 1629:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.c

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1498#issuecomment-50433662 QA results for PR 1498:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):abstract class Dependency[T

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1498#issuecomment-50433630 QA tests have started for PR 1498. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17333/consoleFull --- If

  1   2   3   4   5   >