zhengruifeng created SPARK-39598:
Summary: Make *cache*, *catalog* in the python side support
3-layer-namespace
Key: SPARK-39598
URL: https://issues.apache.org/jira/browse/SPARK-39598
Project: Spark
zhengruifeng created SPARK-39597:
Summary: Make GetTable, TableExists and DatabaseExists in the
python side support 3-layer-namespace
Key: SPARK-39597
URL: https://issues.apache.org/jira/browse/SPARK-39597
zhengruifeng created SPARK-39579:
Summary: Make ListFunctions API compatible
Key: SPARK-39579
URL: https://issues.apache.org/jira/browse/SPARK-39579
Project: Spark
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/SPARK-39555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-39555:
-
Description: Corresponding changes in the python side of SPARK-39236 (Make
CreateTable API and L
zhengruifeng created SPARK-39555:
Summary: Make createTable and listTables in the python side
support 3-layer-namespace
Key: SPARK-39555
URL: https://issues.apache.org/jira/browse/SPARK-39555
Project:
[
https://issues.apache.org/jira/browse/SPARK-39555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-39555:
-
Description: Corresponding changes in the python side to make
> Make createTable and listTables
[
https://issues.apache.org/jira/browse/SPARK-39533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-39533:
-
Summary: Deprecate scoreLabelsWeight in BinaryClassificationMetrics (was:
Remove scoreLabelsWei
[
https://issues.apache.org/jira/browse/SPARK-39533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-39533:
-
Description:
scoreLabelsWeight in BinaryClassificationMetrics is a public variable,
but it shou
[
https://issues.apache.org/jira/browse/SPARK-39534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-39534:
-
Summary: Series.argmax only needs single pass (was: Series.argmax only
need one pass)
> Series
zhengruifeng created SPARK-39534:
Summary: Series.argmax only need one pass
Key: SPARK-39534
URL: https://issues.apache.org/jira/browse/SPARK-39534
Project: Spark
Issue Type: Improvement
zhengruifeng created SPARK-39533:
Summary: Remove scoreLabelsWeight in BinaryClassificationMetrics
Key: SPARK-39533
URL: https://issues.apache.org/jira/browse/SPARK-39533
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-39510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-39510:
-
Summary: Leverage the natural partitioning and ordering of
MonotonicallyIncreasingID (was: leve
[
https://issues.apache.org/jira/browse/SPARK-39510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-39510:
-
Description:
In Pandas-API-on-Spark:
1, *MonotonicallyIncreasingID* and *AttachDistributedSeque
[
https://issues.apache.org/jira/browse/SPARK-39510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-39510:
-
Description:
In Pandas-API-on-Spark:
1, *MonotonicallyIncreasingID* and *AttachDistributedSeque
zhengruifeng created SPARK-39510:
Summary: leverage the natural partitioning and ordering of
MonotonicallyIncreasingID
Key: SPARK-39510
URL: https://issues.apache.org/jira/browse/SPARK-39510
Project:
[
https://issues.apache.org/jira/browse/SPARK-39284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng reassigned SPARK-39284:
Assignee: zhengruifeng
> Implement Groupby.mad
> -
>
>
[
https://issues.apache.org/jira/browse/SPARK-39284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng resolved SPARK-39284.
--
Resolution: Resolved
Resolved by https://github.com/apache/spark/pull/36660
> Implement Group
[
https://issues.apache.org/jira/browse/SPARK-39228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng reassigned SPARK-39228:
Assignee: Xinrong Meng
> Implement `skipna` of `Series.argmax`
>
[
https://issues.apache.org/jira/browse/SPARK-39228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng resolved SPARK-39228.
--
Resolution: Resolved
resolved by https://github.com/apache/spark/pull/36599
> Implement `skip
[
https://issues.apache.org/jira/browse/SPARK-39300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng resolved SPARK-39300.
--
Resolution: Resolved
> Move pandasSkewness and pandasKurtosis into pandas.spark.functions
> --
[
https://issues.apache.org/jira/browse/SPARK-39268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng resolved SPARK-39268.
--
Resolution: Resolved
> AttachDistributedSequenceExec do not checkpoint childRDD with single pa
zhengruifeng created SPARK-39300:
Summary: Move pandasSkewness and pandasKurtosis into
pandas.spark.functions
Key: SPARK-39300
URL: https://issues.apache.org/jira/browse/SPARK-39300
Project: Spark
zhengruifeng created SPARK-39299:
Summary: Series.autocorr use SQL.corr to avoid conversion to vector
Key: SPARK-39299
URL: https://issues.apache.org/jira/browse/SPARK-39299
Project: Spark
Is
zhengruifeng created SPARK-39284:
Summary: Implement Groupby.mad
Key: SPARK-39284
URL: https://issues.apache.org/jira/browse/SPARK-39284
Project: Spark
Issue Type: Sub-task
Componen
zhengruifeng created SPARK-39268:
Summary: AttachDistributedSequenceExec do not checkpoint childRDD
with single partition
Key: SPARK-39268
URL: https://issues.apache.org/jira/browse/SPARK-39268
Projec
[
https://issues.apache.org/jira/browse/SPARK-39129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng resolved SPARK-39129.
--
Resolution: Resolved
Resolved by https://github.com/apache/spark/pull/36486
> impl Groupby.ew
[
https://issues.apache.org/jira/browse/SPARK-39246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540881#comment-17540881
]
zhengruifeng commented on SPARK-39246:
--
Thanks [~Qin Yao] !
> Implement Groupby.sk
[
https://issues.apache.org/jira/browse/SPARK-39092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-39092:
-
Attachment: PropagateEmptyPartitions.pdf
> Propagate Empty Partitions
>
[
https://issues.apache.org/jira/browse/SPARK-39129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng reassigned SPARK-39129:
Assignee: zhengruifeng
> impl Groupby.ewm
>
>
> Key: SPA
zhengruifeng created SPARK-39246:
Summary: Implement Groupby.skew
Key: SPARK-39246
URL: https://issues.apache.org/jira/browse/SPARK-39246
Project: Spark
Issue Type: Sub-task
Compone
zhengruifeng created SPARK-39223:
Summary: implement skew and kurt in
Rolling/RollingGroupby/Expanding/ExpandingGroupby
Key: SPARK-39223
URL: https://issues.apache.org/jira/browse/SPARK-39223
Project:
zhengruifeng created SPARK-39192:
Summary: make pandas-on-spark's kurt consistent with pandas
Key: SPARK-39192
URL: https://issues.apache.org/jira/browse/SPARK-39192
Project: Spark
Issue Type
zhengruifeng created SPARK-39189:
Summary: interpolate supports limit_area
Key: SPARK-39189
URL: https://issues.apache.org/jira/browse/SPARK-39189
Project: Spark
Issue Type: Improvement
zhengruifeng created SPARK-39186:
Summary: make skew consistent with pandas
Key: SPARK-39186
URL: https://issues.apache.org/jira/browse/SPARK-39186
Project: Spark
Issue Type: Improvement
zhengruifeng created SPARK-39129:
Summary: impl Groupby.ewm
Key: SPARK-39129
URL: https://issues.apache.org/jira/browse/SPARK-39129
Project: Spark
Issue Type: Sub-task
Components: P
[
https://issues.apache.org/jira/browse/SPARK-39114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng resolved SPARK-39114.
--
Fix Version/s: 3.4.0
Resolution: Resolved
> ml.optim.aggregator avoid re-allocating buf
zhengruifeng created SPARK-39114:
Summary: ml.optim.aggregator avoid re-allocating buffers
Key: SPARK-39114
URL: https://issues.apache.org/jira/browse/SPARK-39114
Project: Spark
Issue Type: S
[
https://issues.apache.org/jira/browse/SPARK-39058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531100#comment-17531100
]
zhengruifeng commented on SPARK-39058:
--
[~weichenxu123] I can help reivew. BTW, is
zhengruifeng created SPARK-39092:
Summary: Propagate Empty Partitions
Key: SPARK-39092
URL: https://issues.apache.org/jira/browse/SPARK-39092
Project: Spark
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-30661:
-
Affects Version/s: 3.4.0
(was: 3.0.0)
> KMeans blockify input vectors
[
https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-30661:
-
Priority: Major (was: Minor)
> KMeans blockify input vectors
> -
>
zhengruifeng created SPARK-39081:
Summary: Impl DataFrame.resample and Series.resample
Key: SPARK-39081
URL: https://issues.apache.org/jira/browse/SPARK-39081
Project: Spark
Issue Type: Sub-t
zhengruifeng created SPARK-38993:
Summary: Impl DataFrame.boxplot and DataFrame.plot.box
Key: SPARK-38993
URL: https://issues.apache.org/jira/browse/SPARK-38993
Project: Spark
Issue Type: Sub
zhengruifeng created SPARK-38943:
Summary: EWM support ignore_na
Key: SPARK-38943
URL: https://issues.apache.org/jira/browse/SPARK-38943
Project: Spark
Issue Type: Improvement
Compo
zhengruifeng created SPARK-38937:
Summary: interpolate support param `limit_direction`
Key: SPARK-38937
URL: https://issues.apache.org/jira/browse/SPARK-38937
Project: Spark
Issue Type: Impro
zhengruifeng created SPARK-38907:
Summary: Impl DataFrame.corrwith
Key: SPARK-38907
URL: https://issues.apache.org/jira/browse/SPARK-38907
Project: Spark
Issue Type: Sub-task
Compon
zhengruifeng created SPARK-38844:
Summary: impl Series.interpolate and DataFrame.interpolate
Key: SPARK-38844
URL: https://issues.apache.org/jira/browse/SPARK-38844
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-38785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516846#comment-17516846
]
zhengruifeng commented on SPARK-38785:
--
h1. Pandas API on Spark:
[EWM|https://pand
zhengruifeng created SPARK-38785:
Summary: impl Series.ewm and DataFrame.ewm
Key: SPARK-38785
URL: https://issues.apache.org/jira/browse/SPARK-38785
Project: Spark
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/SPARK-38775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng reassigned SPARK-38775:
Assignee: zhengruifeng
> cleanup validation functions
>
>
>
zhengruifeng created SPARK-38775:
Summary: cleanup validation functions
Key: SPARK-38775
URL: https://issues.apache.org/jira/browse/SPARK-38775
Project: Spark
Issue Type: Sub-task
C
zhengruifeng created SPARK-38774:
Summary: impl Series.autocorr
Key: SPARK-38774
URL: https://issues.apache.org/jira/browse/SPARK-38774
Project: Spark
Issue Type: Sub-task
Component
[
https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-37099:
-
Affects Version/s: 3.4.0
(was: 3.3.0)
> Introduce a rank-based filter
[
https://issues.apache.org/jira/browse/SPARK-36638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-36638:
-
Affects Version/s: 3.4.0
(was: 3.3.0)
> Generalize OptimizeSkewedJoin
[
https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-37099:
-
Summary: Introduce a rank-based filter to optimize top-k computation (was:
Impl a rank-based fi
[
https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-37099:
-
Description:
in JD, we found that more than 90% usage of window function follows this
pattern:
zhengruifeng created SPARK-38669:
Summary: Validate input dataset of ml.clustering
Key: SPARK-38669
URL: https://issues.apache.org/jira/browse/SPARK-38669
Project: Spark
Issue Type: Sub-task
zhengruifeng created SPARK-38643:
Summary: Validate input dataset of ml.regression
Key: SPARK-38643
URL: https://issues.apache.org/jira/browse/SPARK-38643
Project: Spark
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38588:
-
Fix Version/s: 3.4.0
> Validate input dataset of ml.classification
> ---
[
https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng resolved SPARK-38588.
--
Resolution: Resolved
> Validate input dataset of ml.classification
> -
[
https://issues.apache.org/jira/browse/SPARK-38588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38588:
-
Summary: Validate input dataset of ml.classification (was: Validate input
dataset of LinearSVC)
[
https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38584:
-
Description:
1, input vector validation is missing in most algorithms, when the input
dataset c
[
https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38584:
-
Description:
1, input vector validation is missing in most algorithms, when the input
dataset c
zhengruifeng created SPARK-38588:
Summary: Validate input dataset of LinearSVC
Key: SPARK-38588
URL: https://issues.apache.org/jira/browse/SPARK-38588
Project: Spark
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38584:
-
Description:
1, input vector validation is missing in most algorithms, when the input
dataset c
[
https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng reassigned SPARK-38584:
Assignee: zhengruifeng
> Unify the data validation
> -
>
>
[
https://issues.apache.org/jira/browse/SPARK-38584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38584:
-
Description:
1, input vector validation is missing in most algorithms, when the input
dataset c
zhengruifeng created SPARK-38584:
Summary: Unify the data validation
Key: SPARK-38584
URL: https://issues.apache.org/jira/browse/SPARK-38584
Project: Spark
Issue Type: Improvement
C
[
https://issues.apache.org/jira/browse/SPARK-38286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38286:
-
Summary: Union's maxRows and maxRowsPerPartition may overflow (was: check
Union's maxRows and m
[
https://issues.apache.org/jira/browse/SPARK-38286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38286:
-
Description:
{code:java}
scala> val df1 = spark.range(0, Long.MaxValue, 1, 1)
df1: org.apache.sp
zhengruifeng created SPARK-38286:
Summary: check Union's maxRows and maxRowsPerPartition
Key: SPARK-38286
URL: https://issues.apache.org/jira/browse/SPARK-38286
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-38271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38271:
-
Summary: PoissonSampler may output more rows than MaxRows (was:
PoissonSampler may generate mor
[
https://issues.apache.org/jira/browse/SPARK-38271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38271:
-
Affects Version/s: 3.2.1
3.1.2
3.0.3
> PoissonSamp
[
https://issues.apache.org/jira/browse/SPARK-38271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-38271:
-
Description:
{code:java}
scala> val df = spark.range(0, 1000)
df: org.apache.spark.sql.Dataset[L
zhengruifeng created SPARK-38271:
Summary: PoissonSampler may generate more rows than MaxRows
Key: SPARK-38271
URL: https://issues.apache.org/jira/browse/SPARK-38271
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-37913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492517#comment-17492517
]
zhengruifeng commented on SPARK-37913:
--
does the `MyTransformer` in the example wor
[
https://issues.apache.org/jira/browse/SPARK-38037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490654#comment-17490654
]
zhengruifeng commented on SPARK-38037:
--
I can reproduce it by:
{code:java}
import o
[
https://issues.apache.org/jira/browse/SPARK-38139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489521#comment-17489521
]
zhengruifeng commented on SPARK-38139:
--
I think it is ok to adjust the tol in this
[
https://issues.apache.org/jira/browse/SPARK-34160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489518#comment-17489518
]
zhengruifeng commented on SPARK-34160:
--
you can get a sparse vector by calling vect
[
https://issues.apache.org/jira/browse/SPARK-34160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng resolved SPARK-34160.
--
Resolution: Not A Problem
> pyspark.ml.stat.Summarizer should allow sparse vector results
> --
[
https://issues.apache.org/jira/browse/SPARK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489517#comment-17489517
]
zhengruifeng commented on SPARK-34452:
--
I can not reproduce this issue in 3.1.2, co
[
https://issues.apache.org/jira/browse/SPARK-37285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489301#comment-17489301
]
zhengruifeng commented on SPARK-37285:
--
were these metrics or algorithms implemente
[
https://issues.apache.org/jira/browse/SPARK-36553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489278#comment-17489278
]
zhengruifeng commented on SPARK-36553:
--
it is a overflow:
{code:java}
scala> val
[
https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489271#comment-17489271
]
zhengruifeng commented on SPARK-30661:
--
ok, I will skip .mllib calling .ml here. We
[
https://issues.apache.org/jira/browse/SPARK-31007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489270#comment-17489270
]
zhengruifeng edited comment on SPARK-31007 at 2/9/22, 6:05 AM:
---
[
https://issues.apache.org/jira/browse/SPARK-31007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489270#comment-17489270
]
zhengruifeng commented on SPARK-31007:
--
this case is not OOM, but the overflow:
[
https://issues.apache.org/jira/browse/SPARK-36714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489258#comment-17489258
]
zhengruifeng commented on SPARK-36714:
--
[~sheng_1992] Since you had investigate thi
[
https://issues.apache.org/jira/browse/SPARK-36714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489232#comment-17489232
]
zhengruifeng commented on SPARK-36714:
--
could you please provide a simple script to
[
https://issues.apache.org/jira/browse/SPARK-31007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489230#comment-17489230
]
zhengruifeng commented on SPARK-31007:
--
[~srowen] This optimization needs an array
[
https://issues.apache.org/jira/browse/SPARK-38037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489225#comment-17489225
]
zhengruifeng commented on SPARK-38037:
--
could you please provide a simple script to
[
https://issues.apache.org/jira/browse/SPARK-33882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng reassigned SPARK-33882:
Assignee: Ludovic Henry (was: zhengruifeng)
> Add a vectorized BLAS implementation
> --
[
https://issues.apache.org/jira/browse/SPARK-33882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng reassigned SPARK-33882:
Assignee: zhengruifeng (was: Ludovic Henry)
> Add a vectorized BLAS implementation
> --
[
https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488611#comment-17488611
]
zhengruifeng commented on SPARK-30661:
--
since the input datasets of kmeans are like
[
https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479793#comment-17479793
]
zhengruifeng edited comment on SPARK-30661 at 1/21/22, 2:54 AM:
--
[
https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-30661:
-
Attachment: blockify_kmeans.png
> KMeans blockify input vectors
> -
[
https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479793#comment-17479793
]
zhengruifeng commented on SPARK-30661:
--
according to https://issues.apache.org/jira
zhengruifeng created SPARK-37961:
Summary: override maxRows/maxRowsPerPartition for some logical
operators
Key: SPARK-37961
URL: https://issues.apache.org/jira/browse/SPARK-37961
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-30661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478437#comment-17478437
]
zhengruifeng commented on SPARK-30661:
--
recently, I spend some time on testing bloc
zhengruifeng created SPARK-37959:
Summary: Fix the UT of checking norm in KMeans & BiKMeans
Key: SPARK-37959
URL: https://issues.apache.org/jira/browse/SPARK-37959
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengruifeng updated SPARK-37099:
-
Attachment: q67.png
q67_optimized.png
> Impl a rank-based filter to optimize top
1 - 100 of 1004 matches
Mail list logo