[GitHub] spark pull request: [SQL] SPARK-5309: Add support for dictionaries...

2015-01-24 Thread MickDavies
Github user MickDavies commented on the pull request: https://github.com/apache/spark/pull/4187#issuecomment-71326437 The dictionary already exists, the change will cause an additional array to be created to hold the converted values, but I do not think this is very significant. I

[GitHub] spark pull request: SPARK-4430 [STREAMING] [TEST] Apache RAT Check...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4189#issuecomment-71330879 [Test build #26051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26051/consoleFull) for PR 4189 at commit

[GitHub] spark pull request: SPARK-2285 [CORE] Give various TaskEndReason s...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4191#issuecomment-71333194 [Test build #26053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26053/consoleFull) for PR 4191 at commit

[GitHub] spark pull request: SPARK-4430 [STREAMING] [TEST] Apache RAT Check...

2015-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4189#issuecomment-71333620 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: SPARK-4430 [STREAMING] [TEST] Apache RAT Check...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4189#issuecomment-71333617 [Test build #26051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26051/consoleFull) for PR 4189 at commit

[GitHub] spark pull request: [SPARK-4786][SQL]: Parquet filter pushdown for...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4156#issuecomment-71334179 [Test build #26055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26055/consoleFull) for PR 4156 at commit

[GitHub] spark pull request: [SQL] SPARK-5309: Add support for dictionaries...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4187#issuecomment-71334169 [Test build #26054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26054/consoleFull) for PR 4187 at commit

[GitHub] spark pull request: SPARK-5393. Flood of util.RackResolver log mes...

2015-01-24 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/4192 SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714 You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-5393

[GitHub] spark pull request: SPARK-5393. Flood of util.RackResolver log mes...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4192#issuecomment-71340711 [Test build #26058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26058/consoleFull) for PR 4192 at commit

[GitHub] spark pull request: SPARK-2285 [CORE] Give various TaskEndReason s...

2015-01-24 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/4191 SPARK-2285 [CORE] Give various TaskEndReason subclass more descriptive names Was this all that you had in mind @rxin -- just a rename? or did I miss the point. The other subclasses look fairly

[GitHub] spark pull request: SPARK-2285 [CORE] Give various TaskEndReason s...

2015-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4191#issuecomment-71335121 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: SPARK-2285 [CORE] Give various TaskEndReason s...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4191#issuecomment-71335117 [Test build #26053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26053/consoleFull) for PR 4191 at commit

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-24 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-71335512 ping @jkbradley Could you please have a final look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-5352][GraphX] Add getPartitionStrategy ...

2015-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4138#issuecomment-71337315 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5352][GraphX] Add getPartitionStrategy ...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4138#issuecomment-71337309 [Test build #26056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26056/consoleFull) for PR 4138 at commit

[GitHub] spark pull request: [SPARK-5366][EC2] Check the mode of private ke...

2015-01-24 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/4162#discussion_r23499848 --- Diff: ec2/spark_ec2.py --- @@ -349,6 +350,15 @@ def launch_cluster(conn, opts, cluster_name): if opts.identity_file is None: print

[GitHub] spark pull request: [SPARK-5214][Test] Add a test to demonstrate E...

2015-01-24 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4174#issuecomment-71332320 Thanks. Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5352][GraphX] Add getPartitionStrategy ...

2015-01-24 Thread ankurdave
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/4138#issuecomment-71334354 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SQL] SPARK-5309: Add support for dictionaries...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4187#issuecomment-71336962 [Test build #26054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26054/consoleFull) for PR 4187 at commit

[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2015-01-24 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-71338140 Ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: SPARK-4147 [CORE] Reduce log4j dependency

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4190#issuecomment-71331330 [Test build #26052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26052/consoleFull) for PR 4190 at commit

[GitHub] spark pull request: SPARK-2285 [CORE] Give various TaskEndReason s...

2015-01-24 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4191#issuecomment-71333463 This is what I had in mind. @andrewor14 can you comment on whether this will break the event log stuff? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4188#issuecomment-71330214 [Test build #26050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26050/consoleFull) for PR 4188 at commit

[GitHub] spark pull request: [SPARK-4786][SQL]: Parquet filter pushdown for...

2015-01-24 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4156#issuecomment-71334132 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-2285 [CORE] Give various TaskEndReason s...

2015-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4191#issuecomment-71338349 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5366][EC2] Check the mode of private ke...

2015-01-24 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/4162#discussion_r23499744 --- Diff: ec2/spark_ec2.py --- @@ -349,6 +350,15 @@ def launch_cluster(conn, opts, cluster_name): if opts.identity_file is None: print

[GitHub] spark pull request: [SPARK-5214][Test] Add a test to demonstrate E...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4174#issuecomment-71325114 [Test build #26049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26049/consoleFull) for PR 4174 at commit

[GitHub] spark pull request: [SPARK-5214][Test] Add a test to demonstrate E...

2015-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/4174#discussion_r23497320 --- Diff: core/src/test/scala/org/apache/spark/util/EventLoopSuite.scala --- @@ -185,4 +185,22 @@ class EventLoopSuite extends FunSuite with Timeouts {

[GitHub] spark pull request: SPARK-4147 [CORE] Reduce log4j dependency

2015-01-24 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/4190 SPARK-4147 [CORE] Reduce log4j dependency Defer use of log4j class until it's known that log4j 1.2 is being used. This may avoid dealing with log4j dependencies for callers that reroute slf4j to

[GitHub] spark pull request: [SPARK-5352][GraphX] Add getPartitionStrategy ...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4138#issuecomment-71334562 [Test build #26056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26056/consoleFull) for PR 4138 at commit

[GitHub] spark pull request: SPARK-2285 [CORE] Give various TaskEndReason s...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4191#issuecomment-71338347 [Test build #26057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26057/consoleFull) for PR 4191 at commit

[GitHub] spark pull request: [SPARK-5366][EC2] Check the mode of private ke...

2015-01-24 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/4162#discussion_r23499969 --- Diff: ec2/spark_ec2.py --- @@ -349,6 +350,15 @@ def launch_cluster(conn, opts, cluster_name): if opts.identity_file is None: print

[GitHub] spark pull request: [SPARK-5214][Test] Add a test to demonstrate E...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4174#issuecomment-71328345 [Test build #26049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26049/consoleFull) for PR 4174 at commit

[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...

2015-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4188#issuecomment-71333105 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4188#issuecomment-71333101 [Test build #26050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26050/consoleFull) for PR 4188 at commit

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4047#issuecomment-71337678 @EntilZha Thanks for sharing your code! I like that it doesn't really change the model API, but I'm not quite clear on what will be public/private in the learning

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-24 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-71338606 @MechCoder This is an addition instead of a correction, but I just realized that Strategy.assertValid() does not check subsamplingRate. Would you mind adding that

[GitHub] spark pull request: [SQL] SPARK-5309: Add support for dictionaries...

2015-01-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4187#discussion_r23500894 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala --- @@ -426,6 +423,33 @@ private[parquet] class

[GitHub] spark pull request: [SQL] SPARK-5309: Add support for dictionaries...

2015-01-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4187#discussion_r23500891 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala --- @@ -426,6 +423,33 @@ private[parquet] class

[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...

2015-01-24 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/4193 SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` doesn't work with Java 8 These are more `javadoc` 8-related changes I spotted while investigating. These should be helpful in any event, but this does not

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501071 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SQL] SPARK-5309: Add support for dictionaries...

2015-01-24 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/4187#discussion_r23501217 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala --- @@ -426,6 +423,33 @@ private[parquet] class

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-24 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-71349920 @jkbradley Fixed. I can haz merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-5383][SQL] Support alias for udtfs

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4186#issuecomment-71352726 [Test build #26062 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26062/consoleFull) for PR 4186 at commit

[GitHub] spark pull request: [SPARK-5401] set executor ID before creating M...

2015-01-24 Thread ryan-williams
GitHub user ryan-williams opened a pull request: https://github.com/apache/spark/pull/4194 [SPARK-5401] set executor ID before creating MetricsSystem You can merge this pull request into a Git repository by running: $ git pull https://github.com/ryan-williams/spark metrics

[GitHub] spark pull request: [SPARK-5402] log executor ID at executor-const...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4195#issuecomment-71357329 [Test build #26064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26064/consoleFull) for PR 4195 at commit

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23502411 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-5401] set executor ID before creating M...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4194#issuecomment-71357143 [Test build #26063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26063/consoleFull) for PR 4194 at commit

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23502479 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-5402] log executor ID at executor-const...

2015-01-24 Thread ryan-williams
GitHub user ryan-williams opened a pull request: https://github.com/apache/spark/pull/4195 [SPARK-5402] log executor ID at executor-construction time also rename slaveHostname to executorHostname You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-5355] make SparkConf thread-safe

2015-01-24 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4143#issuecomment-71345609 Actually we should stick to j.u.c. I'm not sure if we should trust the less commonly used TrieMap. Also make sure we don't use the Scala implicit conversions on

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501018 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501086 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-71346272 [Test build #26060 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26060/consoleFull) for PR 4193 at commit

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501384 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501379 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501438 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread akopich
Github user akopich commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501440 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3298][SQL] Add flag control overwrite r...

2015-01-24 Thread squito
Github user squito commented on the pull request: https://github.com/apache/spark/pull/4175#issuecomment-71348013 Jenkins this is OK to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-24 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4050#issuecomment-71347965 If we use a inputFormat that don‘t instanc of org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}, then we can't get information of input

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501523 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-24 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r23502021 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -199,14 +199,31 @@ private[spark] class ExecutorAllocationManager(

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501107 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala --- @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501109 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala --- @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5347][CORE] Change FileSplit to InputSp...

2015-01-24 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4150#issuecomment-71347933 If we use a inputFormat that don‘t instanc of org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}, then we can't get information of input

[GitHub] spark pull request: [SPARK-5332][Core] Efficient way to deal with ...

2015-01-24 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4118#issuecomment-71348459 Not yet. But as the original method dealing with it looks very inefficient, I think it worth replacing it with small refactor. As you can see, this refactor just mainly

[GitHub] spark pull request: [SPARK-5355] make SparkConf thread-safe

2015-01-24 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4143#issuecomment-71355225 @rxin I will send another PR to change to j.u.c.ConcurrentHashMap. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-71346292 [Test build #26060 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26060/consoleFull) for PR 4193 at commit

[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...

2015-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-71346294 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501230 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SQL] SPARK-5309: Add support for dictionaries...

2015-01-24 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/4187#discussion_r23501242 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala --- @@ -426,6 +423,33 @@ private[parquet] class

[GitHub] spark pull request: [SPARK-3298][SQL] Add flag control overwrite r...

2015-01-24 Thread squito
Github user squito commented on the pull request: https://github.com/apache/spark/pull/4175#issuecomment-71348345 lets try this again ... Jenkins this is OK to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread akopich
Github user akopich commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501548 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-24 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r23502091 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -226,8 +243,11 @@ private[spark] class ExecutorAllocationManager(

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-24 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r23502082 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -237,39 +257,18 @@ private[spark] class ExecutorAllocationManager(

[GitHub] spark pull request: [SPARK-5383][SQL] Support alias for udtfs

2015-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4186#issuecomment-71355052 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5383][SQL] Support alias for udtfs

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4186#issuecomment-71355050 [Test build #26062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26062/consoleFull) for PR 4186 at commit

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3951#issuecomment-71344973 @kazk1018 Thanks for the updates; sorry for the delayed response. Please ping me if updates are added ready for review. The 2 other items which would be good

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501143 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501399 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501394 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23501392 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-4934][CORE] Print remote address in Con...

2015-01-24 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/4157#issuecomment-71348155 I think you are right, it's no need to change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [MLLIB] SPARK-5362 (4526, 2372) Gradient and O...

2015-01-24 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4152#issuecomment-71348152 @avulanov @mengxr Do you know how much of a hit we would take if we used a type parameter for the type of data? I'm imagining a Datum type which would be ```Datum =

[GitHub] spark pull request: [SPARK-3298][SQL] Add flag control overwrite r...

2015-01-24 Thread squito
Github user squito commented on the pull request: https://github.com/apache/spark/pull/4175#issuecomment-71348093 this is mentioned in the jira, but its worth noting again here that this changes the behavior slightly, since it wouldn't throw an exception before. --- If your project

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-71349963 [Test build #26061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26061/consoleFull) for PR 4073 at commit

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-71352330 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-3726] [MLlib] Allow sampling_rate not e...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4073#issuecomment-71352326 [Test build #26061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26061/consoleFull) for PR 4073 at commit

[GitHub] spark pull request: SPARK-4337. [YARN] Add ability to cancel pendi...

2015-01-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/4141#issuecomment-71355510 LGTM, i think @tdas you can take a look at this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: SPARK-5393. Flood of util.RackResolver log mes...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4192#issuecomment-71343059 [Test build #26059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26059/consoleFull) for PR 4192 at commit

[GitHub] spark pull request: [SPARK-5332][Core] Efficient way to deal with ...

2015-01-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4118#issuecomment-71344552 Since executor losses are very infrequent, is it worth bothering to optimize this? Have you seen this be a performance issue in practice? --- If your project is set up

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500731 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -529,6 +530,35 @@ class PythonMLLibAPI extends Serializable {

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500733 --- Diff: python/pyspark/mllib/tree.py --- @@ -24,7 +24,41 @@ from pyspark.mllib.linalg import _convert_to_vector from pyspark.mllib.regression

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500735 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +387,129 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500737 --- Diff: python/pyspark/mllib/tree.py --- @@ -383,6 +387,129 @@ def trainRegressor(cls, data, categoricalFeaturesInfo, numTrees, featureSubsetSt

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500734 --- Diff: python/pyspark/mllib/tree.py --- @@ -24,7 +24,41 @@ from pyspark.mllib.linalg import _convert_to_vector from pyspark.mllib.regression

[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3951#discussion_r23500732 --- Diff: python/pyspark/mllib/tree.py --- @@ -24,7 +24,41 @@ from pyspark.mllib.linalg import _convert_to_vector from pyspark.mllib.regression

[GitHub] spark pull request: SPARK-5393. Flood of util.RackResolver log mes...

2015-01-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4192#issuecomment-71345047 [Test build #26059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26059/consoleFull) for PR 4192 at commit

[GitHub] spark pull request: SPARK-5393. Flood of util.RackResolver log mes...

2015-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4192#issuecomment-71345048 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-24 Thread EntilZha
Github user EntilZha commented on a diff in the pull request: https://github.com/apache/spark/pull/4047#discussion_r23500980 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -0,0 +1,472 @@ +/* + * Licensed to the Apache Software Foundation

  1   2   >