Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-173428238
@yhuai sure, I am closing it now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user JihongMA closed the pull request at:
https://github.com/apache/spark/pull/9172
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9705#issuecomment-157192370
@yhuai @mengxr, agreed, it will make the behavior of stats function in
consistent across Spark SQL.
---
If your project is set up for it, you can reply to this email
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9705#issuecomment-157164943
@yhuai, so only when count = 1, stddev/stddev_samp return NaN, stddev_pop
return 0, when count =0, stddev/stddev_samp/stddev_pop all return null. same
for variance
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9705#issuecomment-156623775
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
GitHub user JihongMA opened a pull request:
https://github.com/apache/spark/pull/9705
Spark 11720
return Double.NaN for mean/average when count == 0 for all numeric types
that is converted to Double, Decimal type continue to return null.
You can merge this pull request into a Git
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9380#issuecomment-156252110
@mengxr do we want to change the behavior for min, max as well?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9380#issuecomment-156249080
@mengxr sure, will take care mean via seperate PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9380#issuecomment-156171034
@AmplabJenkins please retest the change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9380#issuecomment-155988170
@felixcheung Thank you! this is the change I have made to make it pass
for R. I am not familiar with R .
df3 <- agg(gd, age = "stddev")
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9380#issuecomment-153847977
@mengxr rebased with the changes @rxin [SPARK-11490], stddev / variance
mapped to the corresponding sample stddev / variance. I checked Hive doesn't
support
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9380#issuecomment-153775399
@mengxr Please take another look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9422#issuecomment-153492968
@yhuai and @marmbrus, thank you for checking with me. I have a pending PR
(SPARK-11420)to change stddev support through ImerativeAggregate. so please go
ahead and I
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9380#issuecomment-153475180
getStatistics() will continue to return Double value for normal cases,
changing it to return null only for edge cases. is there a strong reason to
return Double.NaN
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9380#issuecomment-153453307
I propose to return null for all cases which currently Double.NaN is
returned. and change getStatistics() to return Any instead of Double.
---
If your project is set
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9380#issuecomment-153451606
so for skewness and kurtosis in case of count =1, we want to return null
instead of 0. I can address it, but instead of returning Double.NaN, should we
return null
GitHub user JihongMA opened a pull request:
https://github.com/apache/spark/pull/9380
[Spark 11420] Updating Stddev support via Imperative Aggregate
switched stddev support from DeclarativeAggregate to ImperativeAggregate.
You can merge this pull request into a Git repository by
Github user JihongMA closed the pull request at:
https://github.com/apache/spark/pull/8778
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/8778#issuecomment-152441928
closing this pull request. based on the performance comparison of
Declartive vs. Imperative aggregate (SPARK-10953), the implementation of
central moment computation
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9003#issuecomment-152301262
Thanks @mengxr , I will send a PR for Stddev.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9003#issuecomment-149997989
Seth, no need to implement the old interface, simply put a placeholder for
resolving it is sufficient, which will go away when this code path is removed.
here
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42668167
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -857,3 +857,329 @@ object
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42665351
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -857,3 +857,329 @@ object
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-149378691
it is named "range" as part of dispersion measure within Univariate Stats.
this is a sub-task under Univariate Stats umbrella JIRA (SPRK-10384)
---
If yo
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-149375416
Range is generally included in Univariate Stats. but Hive doesn't support
it as built-in UDF, just checked.
---
If your project is set up for it, you can rep
GitHub user JihongMA opened a pull request:
https://github.com/apache/spark/pull/9172
[SPARK-1086] Add range support
Adding range support through DeclarativeAggregate API, also prototyped an
alternative ImperativeAggregate implementation as well for perf comparison
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42393271
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -842,3 +699,304 @@ object
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r42393149
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -842,3 +699,304 @@ object
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r41429601
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
---
@@ -88,6 +88,276 @@ case class Average(child
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/9003#discussion_r41428901
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -221,4 +221,40 @@ class DataFrameAggregateSuite extends
GitHub user JihongMA opened a pull request:
https://github.com/apache/spark/pull/8778
SPARK-9296 (add variance support)
extending variance support which leverages stddev implementation.
You can merge this pull request into a Git repository by running:
$ git pull https
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/6297#issuecomment-137802508
R style check failure is caused by commit of SPARK-8951
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/6297#issuecomment-124720386
Please don't test it yet, need to make change to accomodate API change
introduced by other JIRA.
---
If your project is set up for it, you can reply to this emai
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/6297#discussion_r34717145
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
---
@@ -761,3 +761,216 @@ case class LastFunction(expr
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/6297#discussion_r34706204
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
---
@@ -761,3 +761,216 @@ case class LastFunction(expr
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/6297#issuecomment-121326693
Thanks for testing out the code changes. the test failure is caused by
SPARK-8800 and waiting for the fix to be merged.
---
If your project is set up for it, you can
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/7056#issuecomment-118171474
@viirya I have opened a JIRA
https://issues.apache.org/jira/browse/SPARK-8800, and have put more detailed
specification I found how to handle decimal division in the
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/7056#issuecomment-117923839
Thanks for fixing this division problem. after rebasing with the fix, I
noticed one more issue w.r.t the accuracy of Decimal computation.
scala> val
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/6297#issuecomment-117799636
the issue introduced by SPARK-8359 was fixed via SPARK-8677, but causing
accuracy issue over Decimal data, that issue need to be fixed first.
---
If your project is
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/6297#issuecomment-117268484
sorry, the code is not ready to be merged as I noticed one more issue with
Decimal type, fixing it and will let you know once I am ready plus code style
fix.
---
If
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/6297#issuecomment-115977926
while preparing the code change to address review comments. I noticed the
fix for SPARK-8359 is causing issue with decimal type, I put a comment there on
that JIRA and
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/6297#issuecomment-115422767
I will incorporate the comments shortly. Thank you Michael for reviewing
the code.
---
If your project is set up for it, you can reply to this email and have your
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/6297#discussion_r33313504
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest {
val
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/6297#discussion_r33313463
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -372,7 +372,7 @@ class DataFrameSuite extends QueryTest {
val
GitHub user JihongMA opened a pull request:
https://github.com/apache/spark/pull/6297
SPARK-6548 Adding stddev to DataFrame functions
Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm
to compute variance. Please review the code change.
You can merge this
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/6226#issuecomment-103219142
sorry, it is done now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/6226#issuecomment-103213033
I have just updated the title. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/5641#issuecomment-102902698
sorry, I was busy and missed this message. I created a new pull request,
which rebased the change with latest master.
---
If your project is set up for it, you can
Github user JihongMA closed the pull request at:
https://github.com/apache/spark/pull/5641
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user JihongMA opened a pull request:
https://github.com/apache/spark/pull/6226
Spark-7063
I rebased to latest. please review and merge.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JihongMA/spark-1 SPARK-7063-1
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/5933#discussion_r30182168
--- Diff: docs/sql-programming-guide.md ---
@@ -1253,7 +1253,12 @@ This command builds a new assembly jar that includes
Hive. Note that this Hive a
on
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/5933#discussion_r29767006
--- Diff: docs/running-on-yarn.md ---
@@ -305,3 +305,4 @@ If you need a reference to the proper location to put
log files in the YARN so t
- In `yarn
Github user JihongMA commented on a diff in the pull request:
https://github.com/apache/spark/pull/5933#discussion_r29766692
--- Diff: docs/sql-programming-guide.md ---
@@ -1253,7 +1253,12 @@ This command builds a new assembly jar that includes
Hive. Note that this Hive a
on
GitHub user JihongMA opened a pull request:
https://github.com/apache/spark/pull/5933
SPARK-7265 Improving documentation for Spark SQL Hive support
Please review this pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
GitHub user JihongMA opened a pull request:
https://github.com/apache/spark/pull/5904
SPARK-7357 Improving HBaseTest example
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JihongMA/spark-1 SPARK-7357
Alternatively you can
55 matches
Mail list logo