[GitHub] spark pull request: [SPARK-3399][PySpark] Test for PySpark should ...

2014-09-05 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2270 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread shaneknapp
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54665233 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3411]Optimize the schedule procedure in...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-54665441 Can you give this PR a more descriptive title? Optimize the schedule procedure in Master sounds like it could describe many different changes, so it's kind of hard to

[GitHub] spark pull request: [SPARK-3411]Optimize the schedule procedure in...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-54665758 I agree that this seems like a bit of a rare corner-case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3377] [Metrics] Don't mix metrics from ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2250#issuecomment-54667475 Thanks for submitting this! I noticed that #1067 is an old PR addressing a similar issue. If your PR subsumes that one, which I think that it does, could you add

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-05 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2291#discussion_r17191419 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -65,7 +70,7 @@ object DataType extends RegexParsers {

[GitHub] spark pull request: [SPARK-3421][SQL] Allows arbitrary character i...

2014-09-05 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/2291 [SPARK-3421][SQL] Allows arbitrary character in StructField.name `StructField.toString` now quotes the `name` field and escapes backslashes and double quotes within the string. The `DataType`

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/2292 [SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx Using Sphinx to generate API docs for PySpark. requirement: Sphinx ``` $ cd docs/api/python/ $ make html

[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy

2014-09-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2144#issuecomment-54669121 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3417] -Use of old-style classes in pysp...

2014-09-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2288#issuecomment-54669643 The title is a bit confusing, would you mind to change it to Use new-style classes in PySpark ? The patch looks good to me, thanks! --- If your project is set

[GitHub] spark pull request: [SPARK-3409][SQL] Avoid pulling in Exchange op...

2014-09-05 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2282#issuecomment-54669948 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3417] -Use new-style classes in PySpark

2014-09-05 Thread mrocklin
Github user mrocklin commented on the pull request: https://github.com/apache/spark/pull/2288#issuecomment-54670005 done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-3377] [Metrics] Don't mix metrics from ...

2014-09-05 Thread sarutak
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2250#issuecomment-54669961 Thanks, I modified the description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3415] [PySpark] removes SerializingAdap...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2287#discussion_r17192524 --- Diff: python/pyspark/cloudpickle.py --- @@ -691,13 +699,13 @@ def save_file(self, obj): tmpfile.close() if tst != '':

[GitHub] spark pull request: [SPARK-3411]Optimize the schedule procedure in...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-54670726 @WangTaoTheTonic I looked at this more and I think it will actually be slower with the new changes. Before this patch we shuffle all the workers only once, but here

[GitHub] spark pull request: [SPARK-3397] Bump pom.xml version number of ma...

2014-09-05 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2268#issuecomment-54671834 @pwendell do you just want to run the set version and commit it or do you want to do it through this jira? --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-2491]: Fix When an fatal error is throw...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1482#issuecomment-54671967 Just so I understand, by the time we enter this code block we would have already logged an OOM message, so it's confusing if we log more messages. However, I think

[GitHub] spark pull request: [SPARK-3411]Optimize the schedule procedure in...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-54672234 If the problem is all of the drivers landing on the same randomly-chosen worker, I suppose you could treat the randomized list as a circular buffer and go through it

[GitHub] spark pull request: [SPARK-3411]Optimize the schedule procedure in...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-54672541 Oh I see. Wouldn't it be sufficient to just pop the head of `shuffledWorkers` after allocating each driver then? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-3414][SQL] Stores analyzed logical plan...

2014-09-05 Thread liancheng
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/2293 [SPARK-3414][SQL] Stores analyzed logical plan when registering a temp table Case insensitivity breaks when unresolved relation contains attributes with uppercase letters in their names, because

[GitHub] spark pull request: [SPARK-2491]: Fix When an fatal error is throw...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1482#issuecomment-54673163 Here's my understanding of the flow of control that produced the original problem: A task throws an uncaught exception (let's say OutOfMemoryError). This is

[GitHub] spark pull request: [SPARK-3411]Optimize the schedule procedure in...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-54674029 Yeah, I suppose so, but there was one corner-case that I was concerned about (that is addressed by treating it as a circular buffer): Let's say we have a

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54674268 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54675476 When I tried this, I got a lot of warnings saying that Sphinx couldn't import the PySpark modules: ``` [joshrosen python (9081ead...)]$ make html

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54675857 thanks, I had update the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54676035 Maybe we can just stick that command inside the Makefile... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54676182 For those unfamiliar with Sphinx, here's a screenshot of the new docs (which look great!):

[GitHub] spark pull request: [SPARK-1825] Fixes cross-platform submit probl...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/899#discussion_r17195370 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -309,7 +310,7 @@ trait ClientBase extends Logging { //

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54676304 It looks like there's some markup, like `C{(a, b)}`, that doesn't render properly since it's a holdover from the epydocs:

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54676698 There are also a few cases of docstrings that didn't get rendered properly:

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54676716 Yes, I tried to convert most of the markup (but not all of them), what does this mean in epydocs? --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-09-05 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/1723#issuecomment-54676763 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1825] Fixes cross-platform submit probl...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/899#issuecomment-54676819 @zeodtr Thanks for updating the title. Just so I understand the issue, for HDP 2.1 on Windows we need these changes for Spark to run, is that correct? However, with

[GitHub] spark pull request: [SPARK-1825] Fixes cross-platform submit probl...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/899#issuecomment-54676894 Also, I notice that this is opened against branch-1.0. It would be better if you could open it against the master branch so the latest Spark releases will also benefit

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54676911 The markup like `C{}`, `I{}`, etc. is [epydoc's inline markup](http://epydoc.sourceforge.net/manual-epytext.html#basic-inline-markup). In [Sphinx's inline

[GitHub] spark pull request: [SPARK-3086] [SPARK-3043] [SPARK-3156] [mllib]...

2014-09-05 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2125#issuecomment-54677299 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54677307 There are also a few cases of docstrings that didn't get rendered properly: FYI: Those ones are [broken in epydoc,

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2274#issuecomment-54677627 Updated patch adds Python back in and adds the 's' at the end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54677725 I guess we have [some choices](http://sphinx-doc.org/latest/ext/napoleon.html) for the markup language dialect that we use for our docstrings. I tend to prefer the

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-09-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/1723#discussion_r17196753 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -856,13 +859,27 @@ private[spark] object Utils extends Logging { * finding the

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-05 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2274#issuecomment-54678621 Thanks, Sandy. Can you add a unit test in Java to make sure the thing is callable from Java? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3414][SQL] Stores analyzed logical plan...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2293#issuecomment-54679022 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19809/consoleFull) for PR 2293 at commit

[GitHub] spark pull request: [Spark-2381] stop the streaming application if...

2014-09-05 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/1693#issuecomment-54679164 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [Spark-2381] stop the streaming application if...

2014-09-05 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/1693#issuecomment-54679375 @joyyoj Can you please add an unit test for this behavior in the StremaingContextSuite, this is a significant change in the program behavior and should have a unit test.

[GitHub] spark pull request: [Spark-2381] stop the streaming application if...

2014-09-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/1693#discussion_r17197213 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -272,7 +272,15 @@ class ReceiverTracker(ssc:

[GitHub] spark pull request: Spark-3406 add a default storage level to pyth...

2014-09-05 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/2280#issuecomment-54679472 @JoshRosen oh cool, I didn't notice that. I've updated that too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-3217] Add Guava to classpath when SPARK...

2014-09-05 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2141#issuecomment-54679659 test this please. LGFM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [Spark-2381] stop the streaming application if...

2014-09-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/1693#discussion_r17197419 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -272,7 +272,15 @@ class ReceiverTracker(ssc:

[GitHub] spark pull request: [SPARK-3411]Optimize the schedule procedure in...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1106#issuecomment-54680145 Hm, it looks like `launchDriver` is asynchronous, so there seems to be no easy way to identify workers that have already been scheduled to launch a driver. This means

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54680559 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-05 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/2294 [SPARK-3418] Sparse Matrix support (CCS) and additional native BLAS operations added Local `SparseMatrix` support added in Compressed Column Storage (CCS) format in addition to Level-2 and Level-3

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-09-05 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/1723#issuecomment-54682054 Jenkins is currently having issues so cannot test it. But this looks pretty good except comment on the streaming regex - it should be in streaming/util/Utils.scala not in

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-05 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17198796 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +322,44 @@ private[spark] object HadoopRDD { f(inputSplit,

[GitHub] spark pull request: [SPARK-3325] Add a parameter to the method pri...

2014-09-05 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2216#issuecomment-54682610 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3325] Add a parameter to the method pri...

2014-09-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2216#discussion_r17198878 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaDStreamLike.scala --- @@ -50,8 +50,8 @@ trait JavaDStreamLike[T, This :

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54683008 @JoshRosen @nchammas I had addressed all the comments, please take a look again. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-3325] Add a parameter to the method pri...

2014-09-05 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2216#issuecomment-54683003 @watermen @srowen Should definitely add to both API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2274#discussion_r17199169 --- Diff: python/pyspark/tests.py --- @@ -405,22 +404,6 @@ def test_zip_with_different_number_of_items(self): self.assertEquals(a.count(),

[GitHub] spark pull request: [SPARK-3408] Fixed Limit operator so it works ...

2014-09-05 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2281#issuecomment-54684308 I'm not sure if that belongs here. The reason is it is an implicit assumption of all APIs. For example, if you add the result of an iterator that reuses object into an

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54684422 Oh, sorry I didn't mean the badly formatted doc strings should be fixed in this PR. At least not the ones that are also bad in epydoc. That probably should be left for

[GitHub] spark pull request: [WIP][SQL] SPARK-2360: CSV import to SchemaRDD...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r17199889 --- Diff: python/pyspark/sql.py --- @@ -187,6 +187,56 @@ def func(split, iterator): jschema_rdd = self._ssql_ctx.jsonRDD(jrdd.rdd())

[GitHub] spark pull request: [WIP][SQL] SPARK-2360: CSV import to SchemaRDD...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r17199930 --- Diff: python/pyspark/sql.py --- @@ -187,6 +187,56 @@ def func(split, iterator): jschema_rdd = self._ssql_ctx.jsonRDD(jrdd.rdd())

[GitHub] spark pull request: [WIP][SQL] SPARK-2360: CSV import to SchemaRDD...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r17199977 --- Diff: python/pyspark/sql.py --- @@ -187,6 +187,56 @@ def func(split, iterator): jschema_rdd = self._ssql_ctx.jsonRDD(jrdd.rdd())

[GitHub] spark pull request: [WIP][SQL] SPARK-2360: CSV import to SchemaRDD...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r17200060 --- Diff: python/pyspark/sql.py --- @@ -187,6 +187,56 @@ def func(split, iterator): jschema_rdd = self._ssql_ctx.jsonRDD(jrdd.rdd())

[GitHub] spark pull request: [WIP][SQL] SPARK-2360: CSV import to SchemaRDD...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r17200139 --- Diff: python/pyspark/sql.py --- @@ -187,6 +187,56 @@ def func(split, iterator): jschema_rdd = self._ssql_ctx.jsonRDD(jrdd.rdd())

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2294#issuecomment-54685532 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19810/consoleFull) for PR 2294 at commit

[GitHub] spark pull request: [SPARK-3418] Sparse Matrix support (CCS) and a...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2294#issuecomment-54685631 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19810/consoleFull) for PR 2294 at commit

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread shaneknapp
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54685853 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [WIP][SQL] SPARK-2360: CSV import to SchemaRDD...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r17200250 --- Diff: python/pyspark/sql.py --- @@ -187,6 +187,56 @@ def func(split, iterator): jschema_rdd = self._ssql_ctx.jsonRDD(jrdd.rdd())

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54686027 Hmm, the Makefile PYTHONPATH didn't seem to work. Maybe we should add those directories to sys.path in `conf.py`, which has a section for this: ```python

[GitHub] spark pull request: [WIP][SQL] SPARK-2360: CSV import to SchemaRDD...

2014-09-05 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r17200330 --- Diff: python/pyspark/sql.py --- @@ -187,6 +187,56 @@ def func(split, iterator): jschema_rdd = self._ssql_ctx.jsonRDD(jrdd.rdd())

[GitHub] spark pull request: [SPARK-3408] Fixed Limit operator so it works ...

2014-09-05 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2281#issuecomment-54686312 If anything, it should probably be documented in the repartition and ShuffledRDD documentation. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-09-05 Thread freedafeng
Github user freedafeng commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-54686288 Anyone still working on this patch? Pyspark + Hbase is the key to our data science application. I really hope it can work in the very near future. --- If your project

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-54686436 @freedafeng This PR was actually merged and will be available in Spark 1.1 (which should be released _very_ soon).

[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...

2014-09-05 Thread fjiang6
Github user fjiang6 commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-54686931 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread shaneknapp
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54686921 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-05 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2274#discussion_r17200864 --- Diff: python/pyspark/rdd.py --- @@ -515,6 +515,30 @@ def __add__(self, other): raise TypeError return self.union(other)

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-05 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2274#discussion_r17200907 --- Diff: python/pyspark/rdd.py --- @@ -515,6 +515,30 @@ def __add__(self, other): raise TypeError return self.union(other)

[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...

2014-09-05 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2117#issuecomment-54687453 @ash211 since this is a bug fix it seems fine to put it into 1.1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-3211 .take() is OOM-prone with empty par...

2014-09-05 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2117#issuecomment-54687432 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-05 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2274#discussion_r17201280 --- Diff: python/pyspark/rdd.py --- @@ -515,6 +515,30 @@ def __add__(self, other): raise TypeError return self.union(other)

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-05 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-54688393 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1723#discussion_r17201437 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -856,13 +859,27 @@ private[spark] object Utils extends Logging { * finding

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/1723#discussion_r17201452 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -883,8 +900,8 @@ private[spark] object Utils extends Logging { for

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-09-05 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1723#issuecomment-54688918 This sets a flag in `SparkContext` to reflect whether we want streaming call sites vs normal call sites. Doesn't this mean if we use this same `SparkContext` for

[GitHub] spark pull request: [SPARK-1853] Show Streaming application code c...

2014-09-05 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/1723#issuecomment-54689188 No no, we want to preserve the behavior for Spark, and only change it for streaming. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-3414][SQL] Stores analyzed logical plan...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2293#issuecomment-54689687 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19809/consoleFull) for PR 2293 at commit

[GitHub] spark pull request: [SPARK-3430] [PySpark] [Doc] generate PySpark ...

2014-09-05 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2292#issuecomment-54690239 @JoshRosen The Makefile has been fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54690332 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2788] [STREAMING] Add location filterin...

2014-09-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/1717#discussion_r17202369 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -42,9 +44,19 @@ class TwitterInputDStream(

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54690575 Seems like Jenkins is [stuck on this](https://amplab.cs.berkeley.edu/jenkins/view/Pull%20Request%20Builders/job/SparkPullRequestBuilder/19812/console): ERROR:

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54690652 Oh, that's expected. That's a build that I manually triggered; normally the relevant environment variables would have been set by the pull request builder plugin.

[GitHub] spark pull request: [SPARK-3408] Fixed Limit operator so it works ...

2014-09-05 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2281#issuecomment-54691067 @pwendell As I mentioned this morning I was planning to wholesale backport all of `sql/` (including this patch) after the 1.1 release is made. Then I'll probably just

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54691019 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: TEST ONLY DO NOT MERGE

2014-09-05 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2289#issuecomment-54691294 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-05 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-54691310 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-05 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-54691357 Jenkins is not working very well because of an accidental power outage yesterday, and people are working on recovering it. --- If your project is set up for it, you

[GitHub] spark pull request: SPARK-3423: [SQL] Implement BETWEEN for SQLPar...

2014-09-05 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/2295 SPARK-3423: [SQL] Implement BETWEEN for SQLParser You can merge this pull request into a Git repository by running: $ git pull https://github.com/willb/spark sql-between Alternatively you can

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-05 Thread zhzhan
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-54692062 Thanks for the follow up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-05 Thread chutium
Github user chutium commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17203025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDD.scala --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

<    1   2   3   4   5   >