[GitHub] spark pull request: Removed some HashMaps from DAGScheduler by sto...

2014-07-24 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331043 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -355,14 +351,13 @@ class DAGScheduler(

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-24 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15331051 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -225,9 +374,18 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1188#discussion_r15331036 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala --- @@ -99,19 +99,30 @@ private[ui] class StageTableBase( {s.name}

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-24 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15331070 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -215,7 +361,10 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-24 Thread nevillelyh
Github user nevillelyh commented on a diff in the pull request: https://github.com/apache/spark/pull/1188#discussion_r15331106 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala --- @@ -99,19 +99,30 @@ private[ui] class StageTableBase( {s.name}

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-24 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15331108 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -215,7 +361,10 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15331132 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUdf.scala --- @@ -29,6 +29,7 @@ case class ScalaUdf(function: AnyRef,

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15331160 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-24 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-49970751 I don't entirely understand the advantage of having a separate PartialTaskMetrics. Ultimately every field of TaskMetrics except for maybe shuffleFinishTime will be able

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-24 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15331171 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -141,6 +189,104 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331172 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -395,16 +389,9 @@ class DAGScheduler( activeJobs -= job

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1165#issuecomment-49970876 QA results for PR 1165:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brcase class

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-24 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15331224 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -141,6 +189,104 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-24 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15331245 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -141,6 +189,104 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331280 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -395,16 +389,9 @@ class DAGScheduler( activeJobs -= job

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331310 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -355,14 +351,13 @@ class DAGScheduler(

[GitHub] spark pull request: [SPARK-2014] Make PySpark store RDDs in MEMORY...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1051#issuecomment-49971143 QA tests have started for PR 1051. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17104/consoleFull ---

[GitHub] spark pull request: [SPARK-2410][SQL] Cherry picked Hive Thrift/JD...

2014-07-24 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49971137 Removed the WIP tag. It's ready to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

2014-07-24 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15331347 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -87,9 +97,47 @@ private class MemoryStore(blockManager: BlockManager,

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-24 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1253 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-24 Thread colorant
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15331341 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15331362 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2260] Fix standalone-cluster mode, whic...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1538#issuecomment-49971356 QA results for PR 1538:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331481 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -22,6 +22,8 @@ import org.apache.spark.rdd.RDD import

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331465 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -22,6 +22,8 @@ import org.apache.spark.rdd.RDD import

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331488 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -56,6 +58,16 @@ private[spark] class Stage( val numPartitions =

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331519 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -56,6 +58,16 @@ private[spark] class Stage( val numPartitions =

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/1561#issuecomment-49971683 Love the cleanup here!!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2410][SQL] Cherry picked Hive Thrift/JD...

2014-07-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1399#issuecomment-49971698 @liancheng I just merged a patch related to SparkSubmit... can you rebase? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1253#issuecomment-49971888 @sryza I merged this just now because another patch was going to change this code and I wanted to avoid you having to rebase again. That said, I found an issue with

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331775 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -355,14 +351,13 @@ class DAGScheduler(

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15331819 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -22,6 +22,8 @@ import org.apache.spark.rdd.RDD import

[GitHub] spark pull request: Build should not run hive tests by default.

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1565#issuecomment-49972683 QA results for PR 1565:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15332026 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -56,6 +58,16 @@ private[spark] class Stage( val numPartitions =

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15332107 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateProjection.scala --- @@ -0,0 +1,218 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15332175 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1188#discussion_r15332185 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala --- @@ -99,19 +99,30 @@ private[ui] class StageTableBase( {s.name}

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1188#discussion_r15332227 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala --- @@ -99,19 +99,30 @@ private[ui] class StageTableBase( {s.name}

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1188#issuecomment-49973457 Built and tested it locally. I was actually proposing pulling the Persisted RDD part outside of the toggle component. I think it's fine either way, but if we decide to

[GitHub] spark pull request: [SPARK-2014] Make PySpark store RDDs in MEMORY...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1051#issuecomment-49973607 QA results for PR 1051:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-24 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15332387 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1188#issuecomment-49973700 One more thing - after playing in my browser I think just saying RDD: XX is sufficient. Not sure Persistent adds much, and it will save space. Still prefer it non-bold.

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-24 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1499#issuecomment-49973755 Added one more commit that fixes the type of ShuffledRDD, because in this new shuffle it's not possible to return a custom Product2 the way it's written now, and in the

[GitHub] spark pull request: [SPARK-2652] [PySpark] Turning some default co...

2014-07-24 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/1568 [SPARK-2652] [PySpark] Turning some default configs for PySpark Add several default configs for PySpark, related to serialization in JVM. spark.serializer =

[GitHub] spark pull request: [SPARK-2125] Add sort flag and move sort into ...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1210#issuecomment-49973939 QA results for PR 1210:br- This patch PASSES unit tests.brbrFor more information see test

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala

2014-07-24 Thread baishuo
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/1569 Update HiveMetastoreCatalog.scala I think it's better to defined hiveQlTable as a val You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-2652] [PySpark] Turning some default co...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1568#issuecomment-49974134 QA tests have started for PR 1568. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17107/consoleFull ---

[GitHub] spark pull request: [SPARK-2260] Fix standalone-cluster mode, whic...

2014-07-24 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1538#discussion_r15332601 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala --- @@ -62,20 +62,23 @@ object CommandUtils extends Logging {

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala

2014-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1569#issuecomment-49974351 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2125] Add sort flag and move sort into ...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1210#issuecomment-49974441 QA tests have started for PR 1210. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17108/consoleFull ---

[GitHub] spark pull request: [SPARK-2661][bagel]unpersist old processed rdd

2014-07-24 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1519 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2661][bagel]unpersist old processed rdd

2014-07-24 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1519#issuecomment-49974606 Thanks Adrian, merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2661][bagel]unpersist old processed rdd

2014-07-24 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/1519#issuecomment-49975077 Thanks Matei! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2014] Make PySpark store RDDs in MEMORY...

2014-07-24 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/1051#discussion_r15333083 --- Diff: python/pyspark/conf.py --- @@ -99,6 +99,12 @@ def set(self, key, value): self._jconf.set(key, unicode(value)) return

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-24 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15333099 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,436 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-49975385 I was actually a bit confused why `updateShuffleReadMetrics` is synchronized. Can that be called from multiple threads as-is? I wasn't aware of cases where we had

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-24 Thread colorant
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15333148 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15333186 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +352,46 @@ private[spark] class Executor( } } }

[GitHub] spark pull request: [SPARK-2492][Streaming] kafkaReceiver minor ch...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1420#issuecomment-49975753 QA tests have started for PR 1420. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17109/consoleFull ---

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r1512 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r1552 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r1579 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2014] Make PySpark store RDDs in MEMORY...

2014-07-24 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1051#discussion_r15333403 --- Diff: python/pyspark/conf.py --- @@ -99,6 +99,12 @@ def set(self, key, value): self._jconf.set(key, unicode(value)) return self

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333476 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333508 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333558 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + * Licensed to

[GitHub] spark pull request: Build should not run hive tests by default.

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1565#issuecomment-49976502 QA results for PR 1565:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333592 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala --- @@ -47,4 +47,30 @@ package org.apache.spark.sql.catalyst

[GitHub] spark pull request: [SPARK-2014] Make PySpark store RDDs in MEMORY...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1051#issuecomment-49976557 QA tests have started for PR 1051. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17111/consoleFull ---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1460#issuecomment-49976553 QA tests have started for PR 1460. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17110/consoleFull ---

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333688 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratedEvaluationSuite.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333694 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratedEvaluationSuite.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333762 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala --- @@ -47,23 +47,29 @@ case class Generate( } } -

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333803 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333814 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2652] [PySpark] Turning some default co...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1568#issuecomment-49977116 QA results for PR 1568:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333832 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Build should not run hive tests by default.

2014-07-24 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1565#issuecomment-49977140 I feel, making those two examples to compile optionally(by isolating) is better than isolating all the hive tests like this. --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333860 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333900 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333888 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333936 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -18,22 +18,53 @@ package org.apache.spark.sql.execution

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333966 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -51,8 +82,46 @@ abstract class SparkPlan extends QueryPlan[SparkPlan]

[GitHub] spark pull request: [SPARK-2125] Add sort flag and move sort into ...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1210#issuecomment-49977441 QA results for PR 1210:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15333978 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -192,9 +187,9 @@ private[sql] abstract class SparkStrategies

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15334051 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala --- @@ -300,8 +298,16 @@ case class LeftSemiJoinBNL( case class

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15334075 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala --- @@ -17,6 +17,7 @@ package org.apache.spark.sql.parquet

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15334329 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -341,8 +336,9 @@ class DAGScheduler( if (registeredStages.isEmpty ||

[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/993#discussion_r15334407 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -0,0 +1,458 @@ +/* + *

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1561#issuecomment-49978964 Thanks. Pushed a version that should have addressed most of the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1561#discussion_r15334478 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -728,8 +697,6 @@ class DAGScheduler( private def

[GitHub] spark pull request: [SPARK-2492][Streaming] kafkaReceiver minor ch...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1420#issuecomment-49979126 QA results for PR 1420:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2665] [SQL] Add EqualNS Unit Tests

2014-07-24 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/1570 [SPARK-2665] [SQL] Add EqualNS Unit Tests Hive Supports the operator =, which returns same result with EQUAL(=) operator for non-null operands, but returns TRUE if both are NULL, FALSE if

[GitHub] spark pull request: Part of [SPARK-2456] Removed some HashMaps fro...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1561#issuecomment-49979231 QA tests have started for PR 1561. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17112/consoleFull ---

[GitHub] spark pull request: [SPARK-2665] [SQL] Add EqualNS Unit Tests

2014-07-24 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1570#issuecomment-49979539 This is really nice - passes a lot more tests. I guess we will eventually run into the problem that the unit tests without parallelization will take too long... --- If

[GitHub] spark pull request: [SPARK-2665] [SQL] Add EqualNS Unit Tests

2014-07-24 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1570#issuecomment-49979557 Do you mind looking into how we can parallelize the hive compatibility tests? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-2665] [SQL] Add EqualNS Unit Tests

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1570#issuecomment-49979649 QA tests have started for PR 1570. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17113/consoleFull ---

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1567#issuecomment-49979679 QA tests have started for PR 1567. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17114/consoleFull ---

[GitHub] spark pull request: [SPARK-2665] [SQL] Add EqualNS Unit Tests

2014-07-24 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1570#issuecomment-49979808 Yes, I will think about how to parallelize those tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-2657 Use more compact data structures th...

2014-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1555#discussion_r15334913 --- Diff: core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software

  1   2   3   4   >