[GitHub] spark pull request: [SPARK-10861] Add range support

2016-01-20 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9172#issuecomment-173428238 @yhuai sure, I am closing it now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-10861] Add range support

2016-01-20 Thread JihongMA
Github user JihongMA closed the pull request at: https://github.com/apache/spark/pull/9172 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-11720] [SQL][ML] Return Double.NaN inst...

2015-11-16 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9705#issuecomment-157192370 @yhuai @mengxr, agreed, it will make the behavior of stats function in consistent across Spark SQL. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-11720] [SQL][ML] Return Double.NaN inst...

2015-11-16 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9705#issuecomment-157164943 @yhuai, so only when count = 1, stddev/stddev_samp return NaN, stddev_pop return 0, when count =0, stddev/stddev_samp/stddev_pop all return null. same for variance

[GitHub] spark pull request: [SPARK-11720] [SQL][ML] Return Double.NaN inst...

2015-11-13 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9705#issuecomment-156623775 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: Spark 11720

2015-11-13 Thread JihongMA
GitHub user JihongMA opened a pull request: https://github.com/apache/spark/pull/9705 Spark 11720 return Double.NaN for mean/average when count == 0 for all numeric types that is converted to Double, Decimal type continue to return null. You can merge this pull request into a Git

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-12 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-156252110 @mengxr do we want to change the behavior for min, max as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-12 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-156249080 @mengxr sure, will take care mean via seperate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-12 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-156171034 @AmplabJenkins please retest the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-11 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-155988170 @felixcheung Thank you! this is the change I have made to make it pass for R. I am not familiar with R . df3 <- agg(gd, age = "stddev")

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-04 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-153847977 @mengxr rebased with the changes @rxin [SPARK-11490], stddev / variance mapped to the corresponding sample stddev / variance. I checked Hive doesn't support

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-04 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-153775399 @mengxr Please take another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-10429] [SQL] make mutableProjection ato...

2015-11-03 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9422#issuecomment-153492968 @yhuai and @marmbrus, thank you for checking with me. I have a pending PR (SPARK-11420)to change stddev support through ImerativeAggregate. so please go ahead and I

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-03 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-153475180 getStatistics() will continue to return Double value for normal cases, changing it to return null only for edge cases. is there a strong reason to return Double.NaN

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-03 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-153453307 I propose to return null for all cases which currently Double.NaN is returned. and change getStatistics() to return Any instead of Double. --- If your project is set

[GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...

2015-11-03 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9380#issuecomment-153451606 so for skewness and kurtosis in case of count =1, we want to return null instead of 0. I can address it, but instead of returning Double.NaN, should we return null

[GitHub] spark pull request: [Spark 11420] Updating Stddev support via Impe...

2015-10-30 Thread JihongMA
GitHub user JihongMA opened a pull request: https://github.com/apache/spark/pull/9380 [Spark 11420] Updating Stddev support via Imperative Aggregate switched stddev support from DeclarativeAggregate to ImperativeAggregate. You can merge this pull request into a Git repository by

[GitHub] spark pull request: SPARK-9296 (add variance support)

2015-10-29 Thread JihongMA
Github user JihongMA closed the pull request at: https://github.com/apache/spark/pull/8778 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-9296 (add variance support)

2015-10-29 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/8778#issuecomment-152441928 closing this pull request. based on the performance comparison of Declartive vs. Imperative aggregate (SPARK-10953), the implementation of central moment computation

[GitHub] spark pull request: [SPARK-10641][SQL] Add Skewness and Kurtosis S...

2015-10-29 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-152301262 Thanks @mengxr , I will send a PR for Stddev. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-21 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-149997989 Seth, no need to implement the old interface, simply put a placeholder for resolving it is sufficient, which will go away when this code path is removed. here

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-21 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42668167 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -857,3 +857,329 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-21 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42665351 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -857,3 +857,329 @@ object

[GitHub] spark pull request: [SPARK-10861] Add range support

2015-10-19 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9172#issuecomment-149378691 it is named "range" as part of dispersion measure within Univariate Stats. this is a sub-task under Univariate Stats umbrella JIRA (SPRK-10384) --- If yo

[GitHub] spark pull request: [SPARK-10861] Add range support

2015-10-19 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/9172#issuecomment-149375416 Range is generally included in Univariate Stats. but Hive doesn't support it as built-in UDF, just checked. --- If your project is set up for it, you can rep

[GitHub] spark pull request: [SPARK-1086] Add range support

2015-10-19 Thread JihongMA
GitHub user JihongMA opened a pull request: https://github.com/apache/spark/pull/9172 [SPARK-1086] Add range support Adding range support through DeclarativeAggregate API, also prototyped an alternative ImperativeAggregate implementation as well for perf comparison

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-19 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42393271 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -842,3 +699,304 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-19 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r42393149 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -842,3 +699,304 @@ object

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-07 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r41429601 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -88,6 +88,276 @@ case class Average(child

[GitHub] spark pull request: [SPARK-10641][WIP][SQL] Add Skewness and Kurto...

2015-10-07 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/9003#discussion_r41428901 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -221,4 +221,40 @@ class DataFrameAggregateSuite extends

[GitHub] spark pull request: SPARK-9296 (add variance support)

2015-09-16 Thread JihongMA
GitHub user JihongMA opened a pull request: https://github.com/apache/spark/pull/8778 SPARK-9296 (add variance support) extending variance support which leverages stddev implementation. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-09-04 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-137802508 R style check failure is caused by commit of SPARK-8951 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-24 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-124720386 Please don't test it yet, need to make change to accomodate API change introduced by other JIRA. --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r34717145 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -761,3 +761,216 @@ case class LastFunction(expr

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-15 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r34706204 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala --- @@ -761,3 +761,216 @@ case class LastFunction(expr

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-14 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-121326693 Thanks for testing out the code changes. the test failure is caused by SPARK-8800 and waiting for the fix to be merged. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-8677][SQL] Fix non-terminating decimal ...

2015-07-02 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/7056#issuecomment-118171474 @viirya I have opened a JIRA https://issues.apache.org/jira/browse/SPARK-8800, and have put more detailed specification I found how to handle decimal division in the

[GitHub] spark pull request: [SPARK-8677][SQL] Fix non-terminating decimal ...

2015-07-01 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/7056#issuecomment-117923839 Thanks for fixing this division problem. after rebasing with the fix, I noticed one more issue w.r.t the accuracy of Decimal computation. scala> val

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-07-01 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-117799636 the issue introduced by SPARK-8359 was fixed via SPARK-8677, but causing accuracy issue over Decimal data, that issue need to be fixed first. --- If your project is

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-30 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-117268484 sorry, the code is not ready to be merged as I noticed one more issue with Decimal type, fixing it and will let you know once I am ready plus code style fix. --- If

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-27 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-115977926 while preparing the code change to address review comments. I noticed the fix for SPARK-8359 is causing issue with decimal type, I put a comment there on that JIRA and

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-25 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6297#issuecomment-115422767 I will incorporate the comments shortly. Thank you Michael for reviewing the code. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-25 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r33313504 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest { val

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-06-25 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/6297#discussion_r33313463 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest { val

[GitHub] spark pull request: SPARK-6548 Adding stddev to DataFrame function...

2015-05-20 Thread JihongMA
GitHub user JihongMA opened a pull request: https://github.com/apache/spark/pull/6297 SPARK-6548 Adding stddev to DataFrame functions Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change. You can merge this

[GitHub] spark pull request: Spark-7063 when lz4 compression is used, it ca...

2015-05-18 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6226#issuecomment-103219142 sorry, it is done now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Spark-7063

2015-05-18 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/6226#issuecomment-103213033 I have just updated the title. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-7063 when lz4 compression is used, it ca...

2015-05-17 Thread JihongMA
Github user JihongMA commented on the pull request: https://github.com/apache/spark/pull/5641#issuecomment-102902698 sorry, I was busy and missed this message. I created a new pull request, which rebased the change with latest master. --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-7063 when lz4 compression is used, it ca...

2015-05-17 Thread JihongMA
Github user JihongMA closed the pull request at: https://github.com/apache/spark/pull/5641 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: Spark-7063

2015-05-17 Thread JihongMA
GitHub user JihongMA opened a pull request: https://github.com/apache/spark/pull/6226 Spark-7063 I rebased to latest. please review and merge. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JihongMA/spark-1 SPARK-7063-1

[GitHub] spark pull request: SPARK-7265 Improving documentation for Spark S...

2015-05-12 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/5933#discussion_r30182168 --- Diff: docs/sql-programming-guide.md --- @@ -1253,7 +1253,12 @@ This command builds a new assembly jar that includes Hive. Note that this Hive a on

[GitHub] spark pull request: SPARK-7265 Improving documentation for Spark S...

2015-05-06 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/5933#discussion_r29767006 --- Diff: docs/running-on-yarn.md --- @@ -305,3 +305,4 @@ If you need a reference to the proper location to put log files in the YARN so t - In `yarn

[GitHub] spark pull request: SPARK-7265 Improving documentation for Spark S...

2015-05-06 Thread JihongMA
Github user JihongMA commented on a diff in the pull request: https://github.com/apache/spark/pull/5933#discussion_r29766692 --- Diff: docs/sql-programming-guide.md --- @@ -1253,7 +1253,12 @@ This command builds a new assembly jar that includes Hive. Note that this Hive a on

[GitHub] spark pull request: SPARK-7265 Improving documentation for Spark S...

2015-05-05 Thread JihongMA
GitHub user JihongMA opened a pull request: https://github.com/apache/spark/pull/5933 SPARK-7265 Improving documentation for Spark SQL Hive support Please review this pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: SPARK-7357 Improving HBaseTest example

2015-05-04 Thread JihongMA
GitHub user JihongMA opened a pull request: https://github.com/apache/spark/pull/5904 SPARK-7357 Improving HBaseTest example You can merge this pull request into a Git repository by running: $ git pull https://github.com/JihongMA/spark-1 SPARK-7357 Alternatively you can