[GitHub] spark pull request: [SPARK-2062][GraphX] VertexRDD.apply does not ...

2014-09-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1903 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2062][GraphX] VertexRDD.apply does not ...

2014-09-18 Thread ankurdave
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/1903#issuecomment-56140430 Thanks! Merged into master and branch-1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proje

[GitHub] spark pull request: [SPARK-3536][SQL] SELECT on empty parquet tabl...

2014-09-18 Thread ravipesala
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2456 [SPARK-3536][SQL] SELECT on empty parquet table throws exception It return null metadata from parquet if querying on empty parquet file while calculating splits.So added null check and returns th

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-18 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17769855 --- Diff: .gitignore --- @@ -19,6 +19,7 @@ conf/*.sh conf/*.properties conf/*.conf conf/*.xml +conf/slaves --- End diff -- So,

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-18 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17769822 --- Diff: sbin/slaves.sh --- @@ -67,20 +69,26 @@ fi if [ "$HOSTLIST" = "" ]; then if [ "$SPARK_SLAVES" = "" ]; then -export HOSTLIST=

[GitHub] spark pull request: [SPARK-3547]Using a special exit code instead ...

2014-09-18 Thread sarutak
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2421#issuecomment-56137301 @liancheng Ah, I saw sometimes Jenkins ignores us... but recently he is friendly :D --- If your project is set up for it, you can reply to this email and have your repl

[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

2014-09-18 Thread ankurdave
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/2446#issuecomment-56137274 Thanks! ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fe

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-56136714 @derrickburns I cannot see the Jenkins log. Let's call Jenkins again. test this please --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [MLLIB] fix a unresolved reference variable 'n...

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2423#issuecomment-56136584 @OdinLin Thanks for catching the bug! As @davies mentioned, #2378 will completely replace the current SerDe. Could you close this PR? --- If your project is set up for it

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56136476 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2294 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2294#issuecomment-56136224 LGTM. I'm merging this into master. (We might need to make slight changes to some methods before the 1.2 release, but let's not block the multi-model training PR for now.)

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-18 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56135962 Philosophically, I agree with @erikerlandson about it being OK for random generators to be, well, random. If problems are caused by the output of a randomized process

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-09-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r17769391 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -43,66 +46,218 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-18 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56135525 @JoshRosen PySpark/MLlib requires NumPy to run, and I don't think we claimed that we support different versions of NumPy. `sample()` in core is different. Maybe we

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17769270 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56135114 I only had a few minor comments about documentation while trying to do a quick read-through of this patch. No substantive comments. --- If your project is set up for it

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17769069 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17769055 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17769053 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768989 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPa

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768962 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768928 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-5619 Yes, this appears to be an issue with our checker and adding an exclusion is fine for now. The class is private. Just had really minor comments and I can address

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768506 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-09-18 Thread erikerlandson
GitHub user erikerlandson opened a pull request: https://github.com/apache/spark/pull/2455 [SPARK-3250] Implement Gap Sampling optimization for random sampling More efficient sampling, based on Gap Sampling optimization: http://erikerlandson.github.io/blog/2014/09/11/faster-rand

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768479 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768467 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit, firstPa

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56132627 Removing this sounds good to me too. Will upload a patch. I think a measure of how long a task spends in shuffle would be useful though, as it helps users understand whet

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-56132497 These changes look good to me. This addresses what continues to be the #1 issue that we see in Cloudera customer YARN deployments. It's worth considering boosting this wh

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-56132524 @nishkamravi2 mind resolving the merge conflicts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2305#issuecomment-56130009 > This patch fails unit tests. i'm getting HTTP 503 from jenkins, but i'm gonna go out on a limb and say this doc change didn't break the unit tests. --- If your

[GitHub] spark pull request: [YARN] SPARK-2668: Add variable of yarn log di...

2014-09-18 Thread renozhang
Github user renozhang commented on the pull request: https://github.com/apache/spark/pull/1573#issuecomment-56129984 Sorry @tgravescs , these days very busy, I'll address them this weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56129891 that's a very good point, especially about how it's an unsolved problem in general, at least on our existing operating systems. iirc, systems like plan9 tried to address co

[GitHub] spark pull request: [SPARK-3599]Avoid loaing properties file frequ...

2014-09-18 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request: https://github.com/apache/spark/pull/2454 [SPARK-3599]Avoid loaing properties file frequently https://issues.apache.org/jira/browse/SPARK-3599 You can merge this pull request into a Git repository by running: $ git pull https:/

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-09-18 Thread viper-kun
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2391#issuecomment-56129613 @andrewor14 i have checked Hadoop's JobHistoryServer. it is JobHistoryServer's responsibility to delete the application logs. --- If your project is set up for it, y

[GitHub] spark pull request: [SPARK-3589][Minor]remove redundant code

2014-09-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request: https://github.com/apache/spark/pull/2445#discussion_r17766944 --- Diff: bin/spark-class --- @@ -169,7 +169,6 @@ if [ -n "$SPARK_SUBMIT_BOOTSTRAP_DRIVER" ]; then # This is used only if the properties file ac

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-09-18 Thread viper-kun
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/2391#discussion_r17766911 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -100,6 +125,12 @@ private[history] class FsHistoryProvider(conf

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2305#issuecomment-56128662 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20567/consoleFull) for PR 2305 at commit [`c0af05d`](https://github.com/a

[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2014-09-18 Thread derrickburns
Github user derrickburns commented on the pull request: https://github.com/apache/spark/pull/2419#issuecomment-56127880 I don't understand the test failure. Can someone help me? Sent from my iPhone > On Sep 16, 2014, at 6:59 PM, Nicholas Chammas wrote: > >

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-18 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/2344#discussion_r17766373 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -170,6 +170,18 @@ case object TimestampType extends Nati

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-18 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56127611 This is a tricky issue. Exact reproducibility / determinism crops up in two different senses here: re-running an entire job and re-computing a lost partition.

[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.

2014-09-18 Thread brndnmtthws
GitHub user brndnmtthws opened a pull request: https://github.com/apache/spark/pull/2453 [SPARK-3597][Mesos] Implement `killTask`. The MesosSchedulerBackend did not previously implement `killTask`, resulting in an exception. You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2393#issuecomment-56127248 +1 lgtm fyi, i checked, deleteOnExit isn't an option because it cannot recursively delete --- If your project is set up for it, you can reply to this email and ha

[GitHub] spark pull request: [SPARK-2098] All Spark processes should suppor...

2014-09-18 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/2379#discussion_r17766151 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServerArguments.scala --- @@ -44,30 +50,19 @@ private[spark] class HistoryServerArguments(

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56126568 I've cleaned up the patch again. I spent about an hour trying to apply this to the YARN code, but it was pretty difficult to follow so I gave up. --- If you

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/2344#discussion_r17765726 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -170,6 +170,18 @@ case object TimestampType extends N

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/2344#discussion_r17765733 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -170,6 +170,18 @@ case object TimestampType extends N

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2305#issuecomment-56126150 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20567/consoleFull) for PR 2305 at commit [`c0af05d`](https://github.com/ap

[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-09-18 Thread li-zhihui
Github user li-zhihui commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-56126098 @andrewor14 any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3589][Minor]remove redundant code

2014-09-18 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2445#discussion_r17765764 --- Diff: bin/spark-class --- @@ -169,7 +169,6 @@ if [ -n "$SPARK_SUBMIT_BOOTSTRAP_DRIVER" ]; then # This is used only if the properties file actuall

[GitHub] spark pull request: [SPARK-3407][SQL]Add Date type support

2014-09-18 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/2344#discussion_r17765755 --- Diff: sql/core/src/main/java/org/apache/spark/sql/api/java/DateType.java --- @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2305#issuecomment-56125812 thanks for the feedback. i've changed the language to be more inline with your suggestion. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56125354 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20564/consoleFull) for PR 2401 at commit [`d51b74f`](https://github.com/a

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-56125277 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20563/consoleFull) for PR 1486 at commit [`d1f9fe3`](https://github.com/a

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765442 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/BLASSuite.scala --- @@ -126,4 +126,142 @@ class BLASSuite extends FunSuite { }

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17765427 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator(

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-18 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2305#discussion_r17765356 --- Diff: docs/programming-guide.md --- @@ -286,7 +286,7 @@ We describe operations on distributed datasets later on. -One important par

[GitHub] spark pull request: [SPARK-1701] Clarify slice vs partition in the...

2014-09-18 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2305#issuecomment-56124864 Sorry for not reviewing this until now; it sort of fell off my radar. --- If your project is set up for it, you can reply to this email and have your reply appear on Gi

[GitHub] spark pull request: [SPARK-3589][Minor]remove redundant code

2014-09-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request: https://github.com/apache/spark/pull/2445#discussion_r17765294 --- Diff: bin/spark-class --- @@ -169,7 +169,6 @@ if [ -n "$SPARK_SUBMIT_BOOTSTRAP_DRIVER" ]; then # This is used only if the properties file ac

[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-18 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2393#issuecomment-56124852 Thank you all, I've removed the `Signal` and use the `Utils.deleteRecursively` instead. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3554] [PySpark] use broadcast automatic...

2014-09-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2417 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-3554] [PySpark] use broadcast automatic...

2014-09-18 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2417#issuecomment-56124617 LGTM. Surprising that the broadcast variable removal code was never triggered in the test suite before; thanks for fixing that! --- If your project is set up for it,

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765188 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17765183 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator(

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765187 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765178 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765175 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765173 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765167 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765077 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17765048 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator(

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17765001 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [SPARK-1987] EdgePartitionBuilder: More memory...

2014-09-18 Thread larryxiao
Github user larryxiao commented on the pull request: https://github.com/apache/spark/pull/2446#issuecomment-56123977 Thanks Ankur! You are really efficient! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17764905 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-18 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/2294#issuecomment-56123894 @ScrapCodes THANKS A LOT! That fixed it! I didn't realize I didn't update my local repo for such a long time. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17764836 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2451#discussion_r17764833 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala --- @@ -197,4 +201,452 @@ private[mllib] object BLAS extends Serializable {

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56123833 I can emulate the YARN behaviour, but it seems better to just do the same thing with both Mesos and YARN. Thoughts? I can refactor this (including the YARN code) to

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2294#issuecomment-56123782 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20562/consoleFull) for PR 2294 at commit [`88814ed`](https://github.com/a

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56123690 Yes @willb I have the same concern. I think in other contexts "overhead" refers to the additional space, not the total space, so an "overhead fraction" of 0.15 means w

[GitHub] spark pull request: [Minor Hot Fix] Move a line in SparkSubmit to ...

2014-09-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2452 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [Minor Hot Fix] Move a line in SparkSubmit to ...

2014-09-18 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2452#issuecomment-56123334 Thanks @andrewor14, I'm merging this into master and 1.1! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-18 Thread mattf
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-56123348 @JoshRosen it looks like @davies and i are on the same page. how would you like to proceed? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [Minor Hot Fix] Move a line in SparkSubmit to ...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2452#issuecomment-56122936 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20561/consoleFull) for PR 2452 at commit [`d5190ca`](https://github.com/a

[GitHub] spark pull request: [WIP][SPARK-1486][MLlib] Multi Model Training ...

2014-09-18 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2451#issuecomment-56122908 @brkyvz Just wondering: Which reference library are you using to determine the order of arguments for BLAS routines? E.g., it's different from [Netlib LAPACK](http://

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2378#issuecomment-56122608 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20560/consoleFull) for PR 2378 at commit [`810f97f`](https://github.com/a

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17764228 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator(

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56122505 @willb in the case of Yarn, the parameter is called "overhead" because it actually sets the amount of overhead you want to add to the requested heap memory. The PR being r

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17764224 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator(

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17764131 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator(

[GitHub] spark pull request: [Minor Hot Fix] Move a line in SparkSubmit to ...

2014-09-18 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2452#issuecomment-56122218 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request: [WIP][SPARK-2816][SQL] Type-safe SQL Queries

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1759#issuecomment-56121439 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20558/consoleFull) for PR 1759 at commit [`677fa3d`](https://github.com/a

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56121260 @andrewor14 This bothers me too, but in a slightly different way: calling the parameter “overhead” when it really refers to how to scale requested memory to accommodat

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on a diff in the pull request: https://github.com/apache/spark/pull/2401#discussion_r17763516 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CommonProps.scala --- @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Soft

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-56120931 Updated as per @andrewor14 's comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2401#discussion_r17763285 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CommonProps.scala --- @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56120575 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20564/consoleFull) for PR 2401 at commit [`d51b74f`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56120347 Forgot to mention: I also set the executor CPUs correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as w

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-18 Thread brndnmtthws
Github user brndnmtthws commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-56120329 Updated as per @andrewor14's suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

  1   2   3   >