[GitHub] spark pull request: [SPARK-3533][Core] Add saveAsTextFileByKey() m...

2016-03-28 Thread saurfang
Github user saurfang closed the pull request at: https://github.com/apache/spark/pull/8375 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-3533][Core] Add saveAsTextFileByKey() m...

2016-03-28 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/8375#issuecomment-202711183 Thanks @davies. This looks to be a very reasonable workaround. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-12526][SPARKR]`ifelse`, `when`, `otherw...

2015-12-28 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/10481#issuecomment-167717447 Thanks for the review @sun-rui. Hope that's better. Looks like `lintr`, as awesome as it is, let that slip through, which I have filed a separate issue here: https

[GitHub] spark pull request: [SPARK-12526][SPARKR]`ifelse`, `when`, `otherw...

2015-12-26 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/10481#issuecomment-167376718 Indeed. Like I said I don't see compelling reason to use these three functions in vectorized way. Let me know if you have any other comments on the fix

[GitHub] spark pull request: [SPARK-12526][SPARKR]`ifelse`, `when`, `otherw...

2015-12-25 Thread saurfang
GitHub user saurfang opened a pull request: https://github.com/apache/spark/pull/10481 [SPARK-12526][SPARKR]`ifelse`, `when`, `otherwise` unable to take Column as value `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example

[GitHub] spark pull request: [SPARK-3533][Core] Add saveAsTextFileByKey() m...

2015-11-23 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/8375#issuecomment-159143064 I myself definitely want to get this merged. That being said, this will require a spark committer to drive the process. If you feel strongly about this feature, voice

[GitHub] spark pull request: [SPARK-11906][Web UI] Speculation Tasks Cause ...

2015-11-22 Thread saurfang
GitHub user saurfang opened a pull request: https://github.com/apache/spark/pull/9896 [SPARK-11906][Web UI] Speculation Tasks Cause ProgressBar UI Overflow When there are speculative tasks in the stage, running progress bar could overflow and goes hidden on a new line: ![image

[GitHub] spark pull request: Fix typo in AggregationQuerySuite.scala

2015-10-29 Thread saurfang
GitHub user saurfang opened a pull request: https://github.com/apache/spark/pull/9357 Fix typo in AggregationQuerySuite.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/saurfang/spark patch-1 Alternatively you can review

[GitHub] spark pull request: [SPARK-11244][SPARKR] sparkR.stop() should rem...

2015-10-22 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/9205#issuecomment-150329264 Thanks @shivaram! Opened https://issues.apache.org/jira/browse/SPARK-11263 for further discussion. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-11244][SPARKR] sparkR.stop() should rem...

2015-10-22 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/9205#issuecomment-150202122 I still advocate making developer API documentation public. However I think one workaround is to make them roxygen doc again with `#'` but add `#' @rdname .ignore

[GitHub] spark pull request: [SPARK-8277][SPARKR] Faster createDataFrame us...

2015-10-22 Thread saurfang
GitHub user saurfang opened a pull request: https://github.com/apache/spark/pull/9234 [SPARK-8277][SPARKR] Faster createDataFrame using mapply With a single loop using `mapply`, I'm able to create DataFrame much faster from R data.frame. Please see benchmark results

[GitHub] spark pull request: [SPARK-8277][SPARKR] Faster createDataFrame us...

2015-10-22 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/9234#issuecomment-150399405 Ah. Thanks for the pointer and I didn't realize this issue has already been worked on. Looks like that PR already had all my brilliant idea ;) I'm closing

[GitHub] spark pull request: [SPARK-8277][SPARKR] Faster createDataFrame us...

2015-10-22 Thread saurfang
Github user saurfang closed the pull request at: https://github.com/apache/spark/pull/9234 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-11244][SPARKR] sparkR.stop() should rem...

2015-10-21 Thread saurfang
GitHub user saurfang opened a pull request: https://github.com/apache/spark/pull/9205 [SPARK-11244][SPARKR] sparkR.stop() should remove SQLContext SparkR should remove `.sparkRSQLsc` and `.sparkRHivesc` when `sparkR.stop()` is called. Otherwise even when SparkContext

[GitHub] spark pull request: [SPARK-11244][SPARKR] sparkR.stop() should rem...

2015-10-21 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/9205#issuecomment-150021998 I see. Thanks for the information. I thought the convention in R package is to expose as much documentation as possible but don't export functions that is unstable

[GitHub] spark pull request: [SPARK-11244][SPARKR] sparkR.stop() should rem...

2015-10-21 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/9205#issuecomment-150017327 Most of them are "style: Commented code should be removed." For example: ``` R/RDD.R:260:3: style: Commented code should be removed. # unp

[GitHub] spark pull request: [SPARK-10543] [CORE] Peak Execution Memory Qua...

2015-09-11 Thread saurfang
GitHub user saurfang opened a pull request: https://github.com/apache/spark/pull/8726 [SPARK-10543] [CORE] Peak Execution Memory Quantile should be Per-task Basis Read `PEAK_EXECUTION_MEMORY` using `update` to get per task partial value instead of cumulative value. I

[GitHub] spark pull request: [SPARK-10543] [CORE] Peak Execution Memory Qua...

2015-09-11 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/8726#issuecomment-139701846 I added a naive unit test. Let me know if you think it's sufficient or clear. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-3533][Core] Add saveAsTextFileByKey() m...

2015-08-22 Thread saurfang
GitHub user saurfang opened a pull request: https://github.com/apache/spark/pull/8375 [SPARK-3533][Core] Add saveAsTextFileByKey() method to RDDs This adds the functionality of saving a `RDD[(K, V)]` to multiple text files split by key. It covers Scala/Java/Python API

[GitHub] spark pull request: [SPARK-8443][SQL] Split GenerateMutableProject...

2015-07-18 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/7076#issuecomment-122602881 Sounds good. Is this more like what you were looking for? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-8443][SQL] Split GenerateMutableProject...

2015-07-18 Thread saurfang
Github user saurfang commented on a diff in the pull request: https://github.com/apache/spark/pull/7076#discussion_r34953519 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala --- @@ -45,10 +47,41 @@ object

[GitHub] spark pull request: [SPARK-8443][SQL] Split GenerateMutableProject...

2015-07-18 Thread saurfang
Github user saurfang commented on a diff in the pull request: https://github.com/apache/spark/pull/7076#discussion_r34953523 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala --- @@ -45,10 +47,41 @@ object

[GitHub] spark pull request: [SPARK-8443][SQL] Split GenerateMutableProject...

2015-07-18 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/7076#issuecomment-122615568 Done. Let me know if this is sufficient. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-8443][SQL] Split GenerateMutableProject...

2015-07-17 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/7076#issuecomment-122485318 I have pushed a new commit that if only one block is generated, then projections will be inlined as before. Can you please review? Or do you prefer I doing the other

[GitHub] spark pull request: [SPARK-8443][SQL] Split GenerateMutableProject...

2015-07-17 Thread saurfang
Github user saurfang commented on a diff in the pull request: https://github.com/apache/spark/pull/7076#discussion_r34944623 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -42,4 +42,8 @@ class CodeGenerationSuite

[GitHub] spark pull request: [SPARK-8443][SQL] Split GenerateMutableProject...

2015-07-10 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/7076#issuecomment-120438630 I agree with you this fix is kind of a hack. However I would argue if a human were to write these code, he would also split the projection calls into blocks

[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...

2015-06-30 Thread saurfang
Github user saurfang commented on a diff in the pull request: https://github.com/apache/spark/pull/7082#discussion_r33608396 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -572,55 +572,55 @@ private[ui] class StagePage(parent: StagesTab) extends

[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...

2015-06-30 Thread saurfang
Github user saurfang commented on a diff in the pull request: https://github.com/apache/spark/pull/7082#discussion_r33610564 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -572,55 +572,55 @@ private[ui] class StagePage(parent: StagesTab) extends

[GitHub] spark pull request: [SPARK-8443][SQL] Split GenerateMutableProject...

2015-06-29 Thread saurfang
Github user saurfang commented on the pull request: https://github.com/apache/spark/pull/7076#issuecomment-116906241 @cloud-fan Possibly. A similar naive test on `GenerateProjection` breaks on `equals` functions. That being said, I wonder what would be a concrete case

[GitHub] spark pull request: [SPARK-8443][SQL] Split GenerateMutableProject...

2015-06-28 Thread saurfang
GitHub user saurfang opened a pull request: https://github.com/apache/spark/pull/7076 [SPARK-8443][SQL] Split GenerateMutableProjection Codegen due to JVM Code Size Limits By grouping projection calls into multiple apply function, we are able to push the number of projections