[jira] [Commented] (FLINK-32889) BinaryClassificationEvaluator gives wrong weighted AUC value

2023-08-17 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-32889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755780#comment-17755780 ] Fan Hong commented on FLINK-32889: -- BTW, the area under PRC is also found incorrect. PySpark and

[jira] [Created] (FLINK-32889) BinaryClassificationEvaluator gives wrong weighted AUC value

2023-08-17 Thread Fan Hong (Jira)
Fan Hong created FLINK-32889: Summary: BinaryClassificationEvaluator gives wrong weighted AUC value Key: FLINK-32889 URL: https://issues.apache.org/jira/browse/FLINK-32889 Project: Flink Issue

[jira] [Updated] (FLINK-32810) Improve managed memory usage in ListStateWithCache

2023-08-08 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-32810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-32810: - Description: Right now, by default, an instance of `ListStateWithCache` uses up all the managed memory

[jira] [Created] (FLINK-32810) Improve managed memory usage in ListStateWithCache

2023-08-08 Thread Fan Hong (Jira)
Fan Hong created FLINK-32810: Summary: Improve managed memory usage in ListStateWithCache Key: FLINK-32810 URL: https://issues.apache.org/jira/browse/FLINK-32810 Project: Flink Issue Type:

[jira] [Commented] (FLINK-31846) Support cancel final checkpoint when all tasks are finished

2023-04-22 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715345#comment-17715345 ] Fan Hong commented on FLINK-31846: -- [~pnowojski]  Okay, I will explore these options further. >

[jira] [Commented] (FLINK-31846) Support cancel final checkpoint when all tasks are finished

2023-04-21 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714845#comment-17714845 ] Fan Hong commented on FLINK-31846: -- Hi, [~pnowojski] . Thank you for explaining. I am actually a novice

[jira] [Commented] (FLINK-31846) Support cancel final checkpoint when all tasks are finished

2023-04-19 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714369#comment-17714369 ] Fan Hong commented on FLINK-31846: -- In essence, I am using Flink for processing bounded data streams.

[jira] [Created] (FLINK-31846) Support cancel final checkpoint when all tasks are finished

2023-04-18 Thread Fan Hong (Jira)
Fan Hong created FLINK-31846: Summary: Support cancel final checkpoint when all tasks are finished Key: FLINK-31846 URL: https://issues.apache.org/jira/browse/FLINK-31846 Project: Flink Issue

[jira] [Updated] (FLINK-31809) Improve the efficiency of ListStateWithCache#snapshotState

2023-04-14 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31809: - Summary: Improve the efficiency of ListStateWithCache#snapshotState (was: Improve efficiency of

[jira] [Updated] (FLINK-31809) Improve efficiency of ListStateWithCache#snapshotState

2023-04-14 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31809: - Description: In the current implementation of {{{}ListStateWithCache{}}}, the {{snapshotState}}

[jira] [Created] (FLINK-31809) Improve efficiency of ListStateWithCache#snapshotState

2023-04-14 Thread Fan Hong (Jira)
Fan Hong created FLINK-31809: Summary: Improve efficiency of ListStateWithCache#snapshotState Key: FLINK-31809 URL: https://issues.apache.org/jira/browse/FLINK-31809 Project: Flink Issue Type:

[jira] [Updated] (FLINK-31625) Memory and computation inefficiency in KBinsDiscretizer

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31625: - Description: In KBinsDiscretizer, the main computation `findBinEdgesWithXXXStrategy` is accomplished

[jira] [Updated] (FLINK-31625) Memory and computation inefficiency in KBinsDiscretizer

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31625: - Description: In KBinsDiscretizer, the main computation `findBinEdgesWithXXXStrategy` is accomplished

[jira] [Updated] (FLINK-31625) Memory and computation inefficiency in KBinsDiscretizer

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31625: - Summary: Memory and computation inefficiency in KBinsDiscretizer (was: Possbile OOM in

[jira] [Updated] (FLINK-31623) Change to uniform sampling in DataStreamUtils#sample method

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31623: - Description: Current implementation employs two-level sampling method. However, when data instances

[jira] [Updated] (FLINK-31625) Possbile OOM in KBinsDiscretizer

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31625: - Description: In KBinsDiscretizer, the main computation `findBinEdgesWithXXXStrategy` is accomplished

[jira] [Created] (FLINK-31625) Possbile OOM in KBinsDiscretizer

2023-03-27 Thread Fan Hong (Jira)
Fan Hong created FLINK-31625: Summary: Possbile OOM in KBinsDiscretizer Key: FLINK-31625 URL: https://issues.apache.org/jira/browse/FLINK-31625 Project: Flink Issue Type: Bug

[jira] [Updated] (FLINK-31623) Change to uniform sampling in DataStreamUtils#sample method

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31623: - Description: Current implementation employs two-level sampling method. However, when data instances

[jira] [Updated] (FLINK-31623) Change to uniform sampling in DataStreamUtils#sample method

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31623: - Description: Current implementation employs two-level sampling method. However, when data instances

[jira] [Updated] (FLINK-31623) Fix non-uniform sampling to uniform sampling on DataStreamUtils#sample

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31623: - Summary: Fix non-uniform sampling to uniform sampling on DataStreamUtils#sample (was: Improvements on

[jira] [Updated] (FLINK-31623) Change to uniform sampling in DataStreamUtils#sample method

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31623: - Summary: Change to uniform sampling in DataStreamUtils#sample method (was: Fix non-uniform sampling to

[jira] [Updated] (FLINK-31623) Improvements on DataStreamUtils#sample

2023-03-27 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31623: - Issue Type: Bug (was: Improvement) > Improvements on DataStreamUtils#sample >

[jira] [Created] (FLINK-31623) Improvements on DataStreamUtils#sample

2023-03-27 Thread Fan Hong (Jira)
Fan Hong created FLINK-31623: Summary: Improvements on DataStreamUtils#sample Key: FLINK-31623 URL: https://issues.apache.org/jira/browse/FLINK-31623 Project: Flink Issue Type: Improvement

[jira] [Updated] (FLINK-31189) Allow special handle of less frequent values in StringIndexer

2023-02-22 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31189: - Description: Real-world datasets often contain categorical features with millions of distinct values,

[jira] [Updated] (FLINK-31189) Allow special handle of less frequent values in StringIndexer

2023-02-22 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31189: - Summary: Allow special handle of less frequent values in StringIndexer (was: Allow ignore less

[jira] [Updated] (FLINK-31189) Allow ignore less frequent values in StringIndexer

2023-02-22 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31189: - Description: In real-world datasets, categorical features may have millions of distinct values, while

[jira] [Created] (FLINK-31189) Allow ignore less frequent values in StringIndexer

2023-02-22 Thread Fan Hong (Jira)
Fan Hong created FLINK-31189: Summary: Allow ignore less frequent values in StringIndexer Key: FLINK-31189 URL: https://issues.apache.org/jira/browse/FLINK-31189 Project: Flink Issue Type:

[jira] [Created] (FLINK-31030) Support more binary classification evaluation metrics.

2023-02-12 Thread Fan Hong (Jira)
Fan Hong created FLINK-31030: Summary: Support more binary classification evaluation metrics. Key: FLINK-31030 URL: https://issues.apache.org/jira/browse/FLINK-31030 Project: Flink Issue Type:

[jira] [Updated] (FLINK-31030) Support more binary classification evaluation metrics.

2023-02-12 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31030: - Description: Current `BinaryClassificationEvaluator` only supports 'areaUnderROC', 'areaUnderPR', 'ks'

[jira] [Updated] (FLINK-31029) KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data contains only 2 distinct values

2023-02-12 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31029: - Description: When one input column contains only 2 distinct values and their counts are same,

[jira] [Updated] (FLINK-31029) KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data contains only 2 distinct values

2023-02-12 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31029: - Description: When a input column contains only 2 distinct values, and their counts are same,  >

[jira] [Updated] (FLINK-31029) KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data contains only 2 distinct values

2023-02-12 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31029: - Summary: KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data contains only 2

[jira] [Created] (FLINK-31029) KBinsDiscretizer gives wrong bin edges when input data contains only 2 distinct values

2023-02-12 Thread Fan Hong (Jira)
Fan Hong created FLINK-31029: Summary: KBinsDiscretizer gives wrong bin edges when input data contains only 2 distinct values Key: FLINK-31029 URL: https://issues.apache.org/jira/browse/FLINK-31029

[jira] [Updated] (FLINK-31026) KBinsDiscretizer gives wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31026: - Summary: KBinsDiscretizer gives wrong bin edges when all values are same. (was: KBinsDiscretizer

[jira] [Updated] (FLINK-31026) KBinsDiscretizer should gives wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31026: - Description: Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE} when all

[jira] [Updated] (FLINK-31026) KBinsDiscretizer should gives wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31026: - Description: Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE} when all

[jira] [Updated] (FLINK-31026) KBinsDiscretizer should gives wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-31026: - Summary: KBinsDiscretizer should gives wrong bin edges when all values are same. (was:

[jira] [Created] (FLINK-31026) KBinsDiscretizer should gives binEdges wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)
Fan Hong created FLINK-31026: Summary: KBinsDiscretizer should gives binEdges wrong bin edges when all values are same. Key: FLINK-31026 URL: https://issues.apache.org/jira/browse/FLINK-31026 Project:

[jira] [Closed] (FLINK-30937) Add Transformer and Estimator for GBTClassifier and GBTRegressor

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong closed FLINK-30937. Resolution: Abandoned > Add Transformer and Estimator for GBTClassifier and GBTRegressor >

[jira] [Created] (FLINK-31010) Add Transformer and Estimator for GBTClassifier and GBTRegressor

2023-02-10 Thread Fan Hong (Jira)
Fan Hong created FLINK-31010: Summary: Add Transformer and Estimator for GBTClassifier and GBTRegressor Key: FLINK-31010 URL: https://issues.apache.org/jira/browse/FLINK-31010 Project: Flink

[jira] [Updated] (FLINK-30937) Add Transformer and Estimator for GBTClassifier and GBTRegressor

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30937: - Description: Add Transformer and Estimator for GBTClassifier and GBTRegressor. They are supposed to

[jira] [Closed] (FLINK-30957) Support other missing features (see description)

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong closed FLINK-30957. Resolution: Abandoned > Support other missing features (see description) >

[jira] [Updated] (FLINK-30937) Add Transformer and Estimator for GBTClassifier and GBTRegressor

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30937: - Description: Add    # Support weights. # Support leaf ID. # Support feature importance. # Support

[jira] [Closed] (FLINK-30982) Support checkpoint mechanism in GBT

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong closed FLINK-30982. Resolution: Abandoned > Support checkpoint mechanism in GBT > --- > >

[jira] [Closed] (FLINK-30955) Support early stopping with validation set.

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong closed FLINK-30955. Resolution: Abandoned > Support early stopping with validation set. >

[jira] [Closed] (FLINK-30954) Add estimator and transformer for GBTRegressor

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong closed FLINK-30954. Resolution: Abandoned > Add estimator and transformer for GBTRegressor >

[jira] [Closed] (FLINK-30956) Add Python implementation and documents of GBTClassifier and GBTRegressor.

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong closed FLINK-30956. Resolution: Abandoned > Add Python implementation and documents of GBTClassifier and GBTRegressor. >

[jira] [Closed] (FLINK-30953) Add estimator and transformer for GBTClassifier

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong closed FLINK-30953. Resolution: Abandoned > Add estimator and transformer for GBTClassifier >

[jira] [Closed] (FLINK-30952) Add main training and transforming part

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong closed FLINK-30952. Resolution: Abandoned > Add main training and transforming part > ---

[jira] [Closed] (FLINK-30939) Add preprocessor for GBT algorithms.

2023-02-10 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong closed FLINK-30939. Resolution: Abandoned > Add preprocessor for GBT algorithms. > > >

[jira] [Created] (FLINK-30982) Support checkpoint mechanism in GBT

2023-02-08 Thread Fan Hong (Jira)
Fan Hong created FLINK-30982: Summary: Support checkpoint mechanism in GBT Key: FLINK-30982 URL: https://issues.apache.org/jira/browse/FLINK-30982 Project: Flink Issue Type: Sub-task

[jira] [Updated] (FLINK-30953) Add estimator and transformer for GBTClassifier

2023-02-08 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30953: - Summary: Add estimator and transformer for GBTClassifier (was: Support checkpoint machanism and model

[jira] [Updated] (FLINK-30939) Add preprocessor for GBT algorithms.

2023-02-08 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30939: - Description: Add preprocessor for GBT algorithms to transform data to the format booster can handle. 

[jira] [Updated] (FLINK-30957) Support other missing features (see description)

2023-02-07 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30957: - Summary: Support other missing features (see description) (was: Support other missing features) >

[jira] [Created] (FLINK-30957) Support other missing features

2023-02-07 Thread Fan Hong (Jira)
Fan Hong created FLINK-30957: Summary: Support other missing features Key: FLINK-30957 URL: https://issues.apache.org/jira/browse/FLINK-30957 Project: Flink Issue Type: Sub-task

[jira] [Updated] (FLINK-30956) Add Python implementation and documents of GBTClassifier and GBTRegressor.

2023-02-07 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30956: - Summary: Add Python implementation and documents of GBTClassifier and GBTRegressor. (was: Add Python

[jira] [Created] (FLINK-30956) Add Python implementation of GBTClassifer and GBTRegressor.

2023-02-07 Thread Fan Hong (Jira)
Fan Hong created FLINK-30956: Summary: Add Python implementation of GBTClassifer and GBTRegressor. Key: FLINK-30956 URL: https://issues.apache.org/jira/browse/FLINK-30956 Project: Flink Issue

[jira] [Created] (FLINK-30955) Support early stopping with validation set.

2023-02-07 Thread Fan Hong (Jira)
Fan Hong created FLINK-30955: Summary: Support early stopping with validation set. Key: FLINK-30955 URL: https://issues.apache.org/jira/browse/FLINK-30955 Project: Flink Issue Type: Sub-task

[jira] [Updated] (FLINK-30952) Add main training and transforming part

2023-02-07 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30952: - Summary: Add main training and transforming part (was: Add main training and transforming part.) >

[jira] [Updated] (FLINK-30954) Add estimator and transformer for GBTRegressor

2023-02-07 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30954: - Summary: Add estimator and transformer for GBTRegressor (was: Add estimator and transformer for

[jira] [Updated] (FLINK-30953) Support checkpoint machanism and model save/load

2023-02-07 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30953: - Summary: Support checkpoint machanism and model save/load (was: Support intermediate state management

[jira] [Updated] (FLINK-30954) Add estimator and transformer for GBTRegressor.

2023-02-07 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30954: - Component/s: Library / Machine Learning > Add estimator and transformer for GBTRegressor. >

[jira] [Updated] (FLINK-30953) Support intermediate state management and model save/load.

2023-02-07 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30953: - Component/s: Library / Machine Learning > Support intermediate state management and model save/load. >

[jira] [Updated] (FLINK-30952) Add main training and transforming part.

2023-02-07 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30952: - Component/s: Library / Machine Learning > Add main training and transforming part. >

[jira] [Created] (FLINK-30954) Add estimator and transformer for GBTRegressor.

2023-02-07 Thread Fan Hong (Jira)
Fan Hong created FLINK-30954: Summary: Add estimator and transformer for GBTRegressor. Key: FLINK-30954 URL: https://issues.apache.org/jira/browse/FLINK-30954 Project: Flink Issue Type: Sub-task

[jira] [Created] (FLINK-30953) Support intermediate state management and model save/load.

2023-02-07 Thread Fan Hong (Jira)
Fan Hong created FLINK-30953: Summary: Support intermediate state management and model save/load. Key: FLINK-30953 URL: https://issues.apache.org/jira/browse/FLINK-30953 Project: Flink Issue

[jira] [Created] (FLINK-30952) Add main training and transforming part.

2023-02-07 Thread Fan Hong (Jira)
Fan Hong created FLINK-30952: Summary: Add main training and transforming part. Key: FLINK-30952 URL: https://issues.apache.org/jira/browse/FLINK-30952 Project: Flink Issue Type: Sub-task

[jira] [Updated] (FLINK-30939) Add public APIs for GBTClassifer

2023-02-07 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30939: - Summary: Add public APIs for GBTClassifer (was: Add public APIs and topmost framework for

[jira] [Created] (FLINK-30939) Add public APIs and topmost framework for GBTClassifer

2023-02-07 Thread Fan Hong (Jira)
Fan Hong created FLINK-30939: Summary: Add public APIs and topmost framework for GBTClassifer Key: FLINK-30939 URL: https://issues.apache.org/jira/browse/FLINK-30939 Project: Flink Issue Type:

[jira] [Created] (FLINK-30937) Add Transformer and Estimator for GBTClassifier and GBTRegressor

2023-02-07 Thread Fan Hong (Jira)
Fan Hong created FLINK-30937: Summary: Add Transformer and Estimator for GBTClassifier and GBTRegressor Key: FLINK-30937 URL: https://issues.apache.org/jira/browse/FLINK-30937 Project: Flink

[jira] [Commented] (FLINK-30734) KBinsDiscretizer handles Double.NaN incorrectly

2023-02-06 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685034#comment-17685034 ] Fan Hong commented on FLINK-30734: -- Sklearn has a discussion about this feature: [1]  SparkML already

[jira] [Updated] (FLINK-30734) KBinsDiscretizer handles Double.NaN incorrectly

2023-01-18 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30734: - Affects Version/s: ml-2.1.0 > KBinsDiscretizer handles Double.NaN incorrectly >

[jira] [Created] (FLINK-30734) KBinsDiscretizer handles Double.NaN incorrectly

2023-01-18 Thread Fan Hong (Jira)
Fan Hong created FLINK-30734: Summary: KBinsDiscretizer handles Double.NaN incorrectly Key: FLINK-30734 URL: https://issues.apache.org/jira/browse/FLINK-30734 Project: Flink Issue Type: Bug

[jira] [Updated] (FLINK-30730) StringIndexer cannot handle null values correctly

2023-01-18 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30730: - Description: When training data contains null values, StringIndexer throws a exception. The reason is

[jira] [Updated] (FLINK-30730) StringIndexer cannot handle null values correctly

2023-01-18 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30730: - Summary: StringIndexer cannot handle null values correctly (was: StringIndexer cannot handle null

[jira] [Updated] (FLINK-30730) StringIndexer cannot handle null values correctly when training

2023-01-18 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-30730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Hong updated FLINK-30730: - Description: When training data contains null values, StringIndexer throws a exception. The reason is

[jira] [Created] (FLINK-30730) StringIndexer cannot handle null values correctly when training

2023-01-17 Thread Fan Hong (Jira)
Fan Hong created FLINK-30730: Summary: StringIndexer cannot handle null values correctly when training Key: FLINK-30730 URL: https://issues.apache.org/jira/browse/FLINK-30730 Project: Flink

[jira] [Created] (FLINK-30401) Add Estimator and Transformer for MinHashLSH

2022-12-13 Thread Fan Hong (Jira)
Fan Hong created FLINK-30401: Summary: Add Estimator and Transformer for MinHashLSH Key: FLINK-30401 URL: https://issues.apache.org/jira/browse/FLINK-30401 Project: Flink Issue Type: New Feature

[jira] [Comment Edited] (FLINK-16485) Support vectorized Python UDF in the batch mode of old planner

2020-03-09 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054710#comment-17054710 ] Fan Hong edited comment on FLINK-16485 at 3/9/20, 7:22 AM: --- Hi, as a developer

[jira] [Commented] (FLINK-16485) Support vectorized Python UDF in the batch mode of old planner

2020-03-09 Thread Fan Hong (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054710#comment-17054710 ] Fan Hong commented on FLINK-16485: -- Hi, as a developer who is using Flink for machine learning, I think