[
https://issues.apache.org/jira/browse/FLINK-32889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755780#comment-17755780
]
Fan Hong commented on FLINK-32889:
--
BTW, the area under PRC is also found incorrect. PySpark and
Fan Hong created FLINK-32889:
Summary: BinaryClassificationEvaluator gives wrong weighted AUC
value
Key: FLINK-32889
URL: https://issues.apache.org/jira/browse/FLINK-32889
Project: Flink
Issue
[
https://issues.apache.org/jira/browse/FLINK-32810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-32810:
-
Description:
Right now, by default, an instance of `ListStateWithCache` uses up all the
managed memory
Fan Hong created FLINK-32810:
Summary: Improve managed memory usage in ListStateWithCache
Key: FLINK-32810
URL: https://issues.apache.org/jira/browse/FLINK-32810
Project: Flink
Issue Type:
[
https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715345#comment-17715345
]
Fan Hong commented on FLINK-31846:
--
[~pnowojski] Okay, I will explore these options further.
>
[
https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714845#comment-17714845
]
Fan Hong commented on FLINK-31846:
--
Hi, [~pnowojski] . Thank you for explaining. I am actually a novice
[
https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714369#comment-17714369
]
Fan Hong commented on FLINK-31846:
--
In essence, I am using Flink for processing bounded data streams.
Fan Hong created FLINK-31846:
Summary: Support cancel final checkpoint when all tasks are
finished
Key: FLINK-31846
URL: https://issues.apache.org/jira/browse/FLINK-31846
Project: Flink
Issue
[
https://issues.apache.org/jira/browse/FLINK-31809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31809:
-
Summary: Improve the efficiency of ListStateWithCache#snapshotState (was:
Improve efficiency of
[
https://issues.apache.org/jira/browse/FLINK-31809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31809:
-
Description:
In the current implementation of {{{}ListStateWithCache{}}}, the
{{snapshotState}}
Fan Hong created FLINK-31809:
Summary: Improve efficiency of ListStateWithCache#snapshotState
Key: FLINK-31809
URL: https://issues.apache.org/jira/browse/FLINK-31809
Project: Flink
Issue Type:
[
https://issues.apache.org/jira/browse/FLINK-31625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31625:
-
Description:
In KBinsDiscretizer, the main computation `findBinEdgesWithXXXStrategy` is
accomplished
[
https://issues.apache.org/jira/browse/FLINK-31625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31625:
-
Description:
In KBinsDiscretizer, the main computation `findBinEdgesWithXXXStrategy` is
accomplished
[
https://issues.apache.org/jira/browse/FLINK-31625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31625:
-
Summary: Memory and computation inefficiency in KBinsDiscretizer (was:
Possbile OOM in
[
https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31623:
-
Description:
Current implementation employs two-level sampling method.
However, when data instances
[
https://issues.apache.org/jira/browse/FLINK-31625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31625:
-
Description:
In KBinsDiscretizer, the main computation `findBinEdgesWithXXXStrategy` is
accomplished
Fan Hong created FLINK-31625:
Summary: Possbile OOM in KBinsDiscretizer
Key: FLINK-31625
URL: https://issues.apache.org/jira/browse/FLINK-31625
Project: Flink
Issue Type: Bug
[
https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31623:
-
Description:
Current implementation employs two-level sampling method.
However, when data instances
[
https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31623:
-
Description:
Current implementation employs two-level sampling method.
However, when data instances
[
https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31623:
-
Summary: Fix non-uniform sampling to uniform sampling on
DataStreamUtils#sample (was: Improvements on
[
https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31623:
-
Summary: Change to uniform sampling in DataStreamUtils#sample method (was:
Fix non-uniform sampling to
[
https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31623:
-
Issue Type: Bug (was: Improvement)
> Improvements on DataStreamUtils#sample
>
Fan Hong created FLINK-31623:
Summary: Improvements on DataStreamUtils#sample
Key: FLINK-31623
URL: https://issues.apache.org/jira/browse/FLINK-31623
Project: Flink
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/FLINK-31189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31189:
-
Description:
Real-world datasets often contain categorical features with millions of
distinct values,
[
https://issues.apache.org/jira/browse/FLINK-31189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31189:
-
Summary: Allow special handle of less frequent values in StringIndexer
(was: Allow ignore less
[
https://issues.apache.org/jira/browse/FLINK-31189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31189:
-
Description:
In real-world datasets, categorical features may have millions of distinct
values, while
Fan Hong created FLINK-31189:
Summary: Allow ignore less frequent values in StringIndexer
Key: FLINK-31189
URL: https://issues.apache.org/jira/browse/FLINK-31189
Project: Flink
Issue Type:
Fan Hong created FLINK-31030:
Summary: Support more binary classification evaluation metrics.
Key: FLINK-31030
URL: https://issues.apache.org/jira/browse/FLINK-31030
Project: Flink
Issue Type:
[
https://issues.apache.org/jira/browse/FLINK-31030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31030:
-
Description: Current `BinaryClassificationEvaluator` only supports
'areaUnderROC', 'areaUnderPR', 'ks'
[
https://issues.apache.org/jira/browse/FLINK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31029:
-
Description:
When one input column contains only 2 distinct values and their counts are
same,
[
https://issues.apache.org/jira/browse/FLINK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31029:
-
Description: When a input column contains only 2 distinct values, and their
counts are same,
>
[
https://issues.apache.org/jira/browse/FLINK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31029:
-
Summary: KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when
input data contains only 2
Fan Hong created FLINK-31029:
Summary: KBinsDiscretizer gives wrong bin edges when input data
contains only 2 distinct values
Key: FLINK-31029
URL: https://issues.apache.org/jira/browse/FLINK-31029
[
https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31026:
-
Summary: KBinsDiscretizer gives wrong bin edges when all values are same.
(was: KBinsDiscretizer
[
https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31026:
-
Description:
Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE}
when all
[
https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31026:
-
Description:
Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE}
when all
[
https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-31026:
-
Summary: KBinsDiscretizer should gives wrong bin edges when all values are
same. (was:
Fan Hong created FLINK-31026:
Summary: KBinsDiscretizer should gives binEdges wrong bin edges
when all values are same.
Key: FLINK-31026
URL: https://issues.apache.org/jira/browse/FLINK-31026
Project:
[
https://issues.apache.org/jira/browse/FLINK-30937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong closed FLINK-30937.
Resolution: Abandoned
> Add Transformer and Estimator for GBTClassifier and GBTRegressor
>
Fan Hong created FLINK-31010:
Summary: Add Transformer and Estimator for GBTClassifier and
GBTRegressor
Key: FLINK-31010
URL: https://issues.apache.org/jira/browse/FLINK-31010
Project: Flink
[
https://issues.apache.org/jira/browse/FLINK-30937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30937:
-
Description:
Add Transformer and Estimator for GBTClassifier and GBTRegressor.
They are supposed to
[
https://issues.apache.org/jira/browse/FLINK-30957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong closed FLINK-30957.
Resolution: Abandoned
> Support other missing features (see description)
>
[
https://issues.apache.org/jira/browse/FLINK-30937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30937:
-
Description:
Add
# Support weights.
# Support leaf ID.
# Support feature importance.
# Support
[
https://issues.apache.org/jira/browse/FLINK-30982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong closed FLINK-30982.
Resolution: Abandoned
> Support checkpoint mechanism in GBT
> ---
>
>
[
https://issues.apache.org/jira/browse/FLINK-30955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong closed FLINK-30955.
Resolution: Abandoned
> Support early stopping with validation set.
>
[
https://issues.apache.org/jira/browse/FLINK-30954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong closed FLINK-30954.
Resolution: Abandoned
> Add estimator and transformer for GBTRegressor
>
[
https://issues.apache.org/jira/browse/FLINK-30956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong closed FLINK-30956.
Resolution: Abandoned
> Add Python implementation and documents of GBTClassifier and GBTRegressor.
>
[
https://issues.apache.org/jira/browse/FLINK-30953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong closed FLINK-30953.
Resolution: Abandoned
> Add estimator and transformer for GBTClassifier
>
[
https://issues.apache.org/jira/browse/FLINK-30952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong closed FLINK-30952.
Resolution: Abandoned
> Add main training and transforming part
> ---
[
https://issues.apache.org/jira/browse/FLINK-30939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong closed FLINK-30939.
Resolution: Abandoned
> Add preprocessor for GBT algorithms.
>
>
>
Fan Hong created FLINK-30982:
Summary: Support checkpoint mechanism in GBT
Key: FLINK-30982
URL: https://issues.apache.org/jira/browse/FLINK-30982
Project: Flink
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/FLINK-30953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30953:
-
Summary: Add estimator and transformer for GBTClassifier (was: Support
checkpoint machanism and model
[
https://issues.apache.org/jira/browse/FLINK-30939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30939:
-
Description: Add preprocessor for GBT algorithms to transform data to the
format booster can handle.
[
https://issues.apache.org/jira/browse/FLINK-30957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30957:
-
Summary: Support other missing features (see description) (was: Support
other missing features)
>
Fan Hong created FLINK-30957:
Summary: Support other missing features
Key: FLINK-30957
URL: https://issues.apache.org/jira/browse/FLINK-30957
Project: Flink
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/FLINK-30956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30956:
-
Summary: Add Python implementation and documents of GBTClassifier and
GBTRegressor. (was: Add Python
Fan Hong created FLINK-30956:
Summary: Add Python implementation of GBTClassifer and
GBTRegressor.
Key: FLINK-30956
URL: https://issues.apache.org/jira/browse/FLINK-30956
Project: Flink
Issue
Fan Hong created FLINK-30955:
Summary: Support early stopping with validation set.
Key: FLINK-30955
URL: https://issues.apache.org/jira/browse/FLINK-30955
Project: Flink
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/FLINK-30952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30952:
-
Summary: Add main training and transforming part (was: Add main training
and transforming part.)
>
[
https://issues.apache.org/jira/browse/FLINK-30954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30954:
-
Summary: Add estimator and transformer for GBTRegressor (was: Add
estimator and transformer for
[
https://issues.apache.org/jira/browse/FLINK-30953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30953:
-
Summary: Support checkpoint machanism and model save/load (was: Support
intermediate state management
[
https://issues.apache.org/jira/browse/FLINK-30954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30954:
-
Component/s: Library / Machine Learning
> Add estimator and transformer for GBTRegressor.
>
[
https://issues.apache.org/jira/browse/FLINK-30953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30953:
-
Component/s: Library / Machine Learning
> Support intermediate state management and model save/load.
>
[
https://issues.apache.org/jira/browse/FLINK-30952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30952:
-
Component/s: Library / Machine Learning
> Add main training and transforming part.
>
Fan Hong created FLINK-30954:
Summary: Add estimator and transformer for GBTRegressor.
Key: FLINK-30954
URL: https://issues.apache.org/jira/browse/FLINK-30954
Project: Flink
Issue Type: Sub-task
Fan Hong created FLINK-30953:
Summary: Support intermediate state management and model save/load.
Key: FLINK-30953
URL: https://issues.apache.org/jira/browse/FLINK-30953
Project: Flink
Issue
Fan Hong created FLINK-30952:
Summary: Add main training and transforming part.
Key: FLINK-30952
URL: https://issues.apache.org/jira/browse/FLINK-30952
Project: Flink
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/FLINK-30939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30939:
-
Summary: Add public APIs for GBTClassifer (was: Add public APIs and
topmost framework for
Fan Hong created FLINK-30939:
Summary: Add public APIs and topmost framework for GBTClassifer
Key: FLINK-30939
URL: https://issues.apache.org/jira/browse/FLINK-30939
Project: Flink
Issue Type:
Fan Hong created FLINK-30937:
Summary: Add Transformer and Estimator for GBTClassifier and
GBTRegressor
Key: FLINK-30937
URL: https://issues.apache.org/jira/browse/FLINK-30937
Project: Flink
[
https://issues.apache.org/jira/browse/FLINK-30734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685034#comment-17685034
]
Fan Hong commented on FLINK-30734:
--
Sklearn has a discussion about this feature: [1]
SparkML already
[
https://issues.apache.org/jira/browse/FLINK-30734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30734:
-
Affects Version/s: ml-2.1.0
> KBinsDiscretizer handles Double.NaN incorrectly
>
Fan Hong created FLINK-30734:
Summary: KBinsDiscretizer handles Double.NaN incorrectly
Key: FLINK-30734
URL: https://issues.apache.org/jira/browse/FLINK-30734
Project: Flink
Issue Type: Bug
[
https://issues.apache.org/jira/browse/FLINK-30730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30730:
-
Description:
When training data contains null values, StringIndexer throws a exception. The
reason is
[
https://issues.apache.org/jira/browse/FLINK-30730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30730:
-
Summary: StringIndexer cannot handle null values correctly (was:
StringIndexer cannot handle null
[
https://issues.apache.org/jira/browse/FLINK-30730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fan Hong updated FLINK-30730:
-
Description:
When training data contains null values, StringIndexer throws a exception. The
reason is
Fan Hong created FLINK-30730:
Summary: StringIndexer cannot handle null values correctly when
training
Key: FLINK-30730
URL: https://issues.apache.org/jira/browse/FLINK-30730
Project: Flink
Fan Hong created FLINK-30401:
Summary: Add Estimator and Transformer for MinHashLSH
Key: FLINK-30401
URL: https://issues.apache.org/jira/browse/FLINK-30401
Project: Flink
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/FLINK-16485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054710#comment-17054710
]
Fan Hong edited comment on FLINK-16485 at 3/9/20, 7:22 AM:
---
Hi, as a developer
[
https://issues.apache.org/jira/browse/FLINK-16485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054710#comment-17054710
]
Fan Hong commented on FLINK-16485:
--
Hi, as a developer who is using Flink for machine learning, I think
80 matches
Mail list logo