[jira] [Created] (SPARK-14971) PySpark ML Params setter code clean up

2016-04-27 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-14971: --- Summary: PySpark ML Params setter code clean up Key: SPARK-14971 URL: https://issues.apache.org/jira/browse/SPARK-14971 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-14906) Move VectorUDT and MatrixUDT in PySpark to new ML package

2016-04-27 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260216#comment-15260216 ] Yanbo Liang commented on SPARK-14906: - I will give a try if no one work on it. Thanks. > Move

[jira] [Comment Edited] (SPARK-14831) Make ML APIs in SparkR consistent

2016-04-22 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254032#comment-15254032 ] Yanbo Liang edited comment on SPARK-14831 at 4/22/16 2:37 PM: -- This change

[jira] [Commented] (SPARK-14831) Make ML APIs in SparkR consistent

2016-04-22 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254032#comment-15254032 ] Yanbo Liang commented on SPARK-14831: - This change looks good to me. Thanks! BTW, I think we should

[jira] [Commented] (SPARK-14847) ML/MLlib breaking changes between 1.6 & 2.0

2016-04-22 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253774#comment-15253774 ] Yanbo Liang commented on SPARK-14847: - [~sowen] Sorry, I did not found out SPARK-13448. I will close

[jira] [Updated] (SPARK-14847) ML/MLlib breaking changes between 1.6 & 2.0

2016-04-22 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14847: Description: This PR records the breaking changes of ML/MLlib between 1.6 and 2.0, so we can note

[jira] [Created] (SPARK-14847) ML/MLlib breaking changes between 1.6 & 2.0

2016-04-22 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-14847: --- Summary: ML/MLlib breaking changes between 1.6 & 2.0 Key: SPARK-14847 URL: https://issues.apache.org/jira/browse/SPARK-14847 Project: Spark Issue Type:

[jira] [Commented] (SPARK-11559) Make `runs` no effect in k-means

2016-04-22 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253755#comment-15253755 ] Yanbo Liang commented on SPARK-11559: - Sure, sent https://github.com/apache/spark/pull/12608 to

[jira] [Commented] (SPARK-14730) Expose ColumnPruner as feature transformer

2016-04-20 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251115#comment-15251115 ] Yanbo Liang commented on SPARK-14730: - [~BenFradet] I'm not working on it. Please feel free to take

[jira] [Commented] (SPARK-14659) OneHotEncoder support drop first category alphabetically in the encoded vector

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242762#comment-15242762 ] Yanbo Liang commented on SPARK-14659: - [~mengxr] [~josephkb] > OneHotEncoder support drop first

[jira] [Updated] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14657: Description: SparkR::glm output different features compared with R glm when fit w/o intercept and

[jira] [Updated] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14657: Description: SparkR::glm output different features compared with R glm when fit w/o intercept and

[jira] [Updated] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14657: Description: SparkR::glm output different features compared with R glm when fit w/o intercept.

[jira] [Updated] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14657: Description: SparkR::glm output different features compared with R glm. Take the following

[jira] [Updated] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14657: Description: SparkR::glm output different features compared with R glm. SparkR::glm {quote}

[jira] [Updated] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14657: Description: SparkR::glm output different features compared with R glm. SparkR::glm {quote}

[jira] [Comment Edited] (SPARK-14659) OneHotEncoder support drop first category alphabetically in the encoded vector

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242739#comment-15242739 ] Yanbo Liang edited comment on SPARK-14659 at 4/15/16 10:08 AM: --- Take the

[jira] [Commented] (SPARK-14659) OneHotEncoder support drop first category alphabetically in the encoded vector

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242739#comment-15242739 ] Yanbo Liang commented on SPARK-14659: - For example: {quote} df=data.frame(id = c(1, 2, 3, 4), a =

[jira] [Created] (SPARK-14659) OneHotEncoder support drop first category alphabetically in the encoded vector

2016-04-15 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-14659: --- Summary: OneHotEncoder support drop first category alphabetically in the encoded vector Key: SPARK-14659 URL: https://issues.apache.org/jira/browse/SPARK-14659

[jira] [Commented] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242709#comment-15242709 ] Yanbo Liang commented on SPARK-14657: - >From the above cases, we can learn that only the first

[jira] [Comment Edited] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242701#comment-15242701 ] Yanbo Liang edited comment on SPARK-14657 at 4/15/16 9:44 AM: -- More cases in

[jira] [Comment Edited] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242701#comment-15242701 ] Yanbo Liang edited comment on SPARK-14657 at 4/15/16 9:44 AM: -- More cases in

[jira] [Comment Edited] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242701#comment-15242701 ] Yanbo Liang edited comment on SPARK-14657 at 4/15/16 9:41 AM: -- More cases in

[jira] [Comment Edited] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242701#comment-15242701 ] Yanbo Liang edited comment on SPARK-14657 at 4/15/16 9:41 AM: -- More cases in

[jira] [Comment Edited] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242701#comment-15242701 ] Yanbo Liang edited comment on SPARK-14657 at 4/15/16 9:39 AM: -- More cases in

[jira] [Comment Edited] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242701#comment-15242701 ] Yanbo Liang edited comment on SPARK-14657 at 4/15/16 9:40 AM: -- More cases in

[jira] [Commented] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242701#comment-15242701 ] Yanbo Liang commented on SPARK-14657: - More cases in R: {quote} df=data.frame(income=c(5,5,3,3,6,5),

[jira] [Updated] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14657: Description: SparkR::glm output different features compared with R glm. SparkR::glm {quote}

[jira] [Updated] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14657: Description: SparkR::glm output different features compared with R glm. SparkR::glm {quote}

[jira] [Created] (SPARK-14657) RFormula output wrong features when formula w/o intercept

2016-04-15 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-14657: --- Summary: RFormula output wrong features when formula w/o intercept Key: SPARK-14657 URL: https://issues.apache.org/jira/browse/SPARK-14657 Project: Spark

[jira] [Commented] (SPARK-10574) HashingTF should use MurmurHash3

2016-04-13 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240601#comment-15240601 ] Yanbo Liang commented on SPARK-10574: - Sure, I will sent a PR in a few days. Thanks! > HashingTF

[jira] [Commented] (SPARK-13925) Expose R-like summary statistics in SparkR::glm for more family and link functions

2016-04-13 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239345#comment-15239345 ] Yanbo Liang commented on SPARK-13925: - I'm working on it. > Expose R-like summary statistics in

[jira] [Commented] (SPARK-14311) Model persistence in SparkR

2016-04-12 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238500#comment-15238500 ] Yanbo Liang commented on SPARK-14311: - Sure, I can have a try. Another issue is R `Object` has

[jira] [Commented] (SPARK-13590) Document the behavior of spark.ml logistic regression when there are constant features

2016-04-12 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237022#comment-15237022 ] Yanbo Liang commented on SPARK-13590: - If there are constant columns and fitIntercept is false, Spark

[jira] [Comment Edited] (SPARK-14479) GLM supports output link prediction

2016-04-10 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234124#comment-15234124 ] Yanbo Liang edited comment on SPARK-14479 at 4/10/16 1:59 PM: -- Had offline

[jira] [Updated] (SPARK-14479) GLM supports output link prediction

2016-04-10 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14479: Summary: GLM supports output link prediction (was: GLM predict type should be link or response?)

[jira] [Updated] (SPARK-14479) GLM predict type should be link or response?

2016-04-10 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14479: Issue Type: Improvement (was: Question) > GLM predict type should be link or response? >

[jira] [Commented] (SPARK-14479) GLM predict type should be link or response?

2016-04-10 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234124#comment-15234124 ] Yanbo Liang commented on SPARK-14479: - Had offline discussion with [~mengxr], we can output 2

[jira] [Comment Edited] (SPARK-14479) GLM predict type should be link or response?

2016-04-10 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234124#comment-15234124 ] Yanbo Liang edited comment on SPARK-14479 at 4/10/16 1:52 PM: -- Had offline

[jira] [Closed] (SPARK-14517) GLM should support predict link

2016-04-10 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang closed SPARK-14517. --- Resolution: Duplicate > GLM should support predict link > --- > >

[jira] [Updated] (SPARK-14517) GLM should support predict link

2016-04-10 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14517: Issue Type: Improvement (was: Question) > GLM should support predict link >

[jira] [Updated] (SPARK-14517) GLM should support predict link

2016-04-10 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14517: Summary: GLM should support predict link (was: CLONE - GLM predict type should be link or

[jira] [Created] (SPARK-14517) CLONE - GLM predict type should be link or response?

2016-04-10 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-14517: --- Summary: CLONE - GLM predict type should be link or response? Key: SPARK-14517 URL: https://issues.apache.org/jira/browse/SPARK-14517 Project: Spark Issue

[jira] [Commented] (SPARK-14478) Should StandardScaler use biased variance to scale?

2016-04-08 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232468#comment-15232468 ] Yanbo Liang commented on SPARK-14478: - Should we add a param that control whether use biased or

[jira] [Updated] (SPARK-14479) GLM predict type should be link or response?

2016-04-07 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14479: Component/s: SparkR > GLM predict type should be link or response? >

[jira] [Updated] (SPARK-14479) GLM predict type should be link or response?

2016-04-07 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14479: Description: In R glm and glmnet, the default type of predict is "link" which is the linear

[jira] [Commented] (SPARK-14479) GLM predict type should be link or response?

2016-04-07 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231604#comment-15231604 ] Yanbo Liang commented on SPARK-14479: - This will introduce break change, so it's better make decision

[jira] [Updated] (SPARK-14479) GLM predict type should be link or response?

2016-04-07 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-14479: Description: In R glm and glmnet, the default type of predict is "link" which is the linear

[jira] [Created] (SPARK-14479) GLM predict type should be link or response?

2016-04-07 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-14479: --- Summary: GLM predict type should be link or response? Key: SPARK-14479 URL: https://issues.apache.org/jira/browse/SPARK-14479 Project: Spark Issue Type:

[jira] [Commented] (SPARK-14378) Review spark.ml parity for regression, except trees

2016-04-07 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231579#comment-15231579 ] Yanbo Liang commented on SPARK-14378: - I can work on it. > Review spark.ml parity for regression,

[jira] [Comment Edited] (SPARK-14311) Model persistence in SparkR

2016-04-06 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227811#comment-15227811 ] Yanbo Liang edited comment on SPARK-14311 at 4/6/16 6:34 AM: - When I worked

[jira] [Commented] (SPARK-14311) Model persistence in SparkR

2016-04-06 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227811#comment-15227811 ] Yanbo Liang commented on SPARK-14311: - When I worked at SPARK-14313, I found that we can easily

[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-04-04 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225525#comment-15225525 ] Yanbo Liang commented on SPARK-13783: - [~josephkb] I will work on this. > Model export/import for

[jira] [Comment Edited] (SPARK-14303) Refactor SparkRWrappers

2016-03-31 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1522#comment-1522 ] Yanbo Liang edited comment on SPARK-14303 at 4/1/16 4:00 AM: - [~mengxr] I

[jira] [Commented] (SPARK-14303) Refactor SparkRWrappers

2016-03-31 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1522#comment-1522 ] Yanbo Liang commented on SPARK-14303: - [~mengxr] I have make the refactor for k-means, I will link

[jira] [Commented] (SPARK-14313) AFTSurvivalRegression model persistence in SparkR

2016-03-31 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221103#comment-15221103 ] Yanbo Liang commented on SPARK-14313: - Sure, please assign it to me. > AFTSurvivalRegression model

[jira] [Created] (SPARK-14298) LDA should support disable checkpoint

2016-03-31 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-14298: --- Summary: LDA should support disable checkpoint Key: SPARK-14298 URL: https://issues.apache.org/jira/browse/SPARK-14298 Project: Spark Issue Type: Improvement

[jira] [Comment Edited] (SPARK-7424) spark.ml classification, regression abstractions should add metadata to output column

2016-03-30 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218253#comment-15218253 ] Yanbo Liang edited comment on SPARK-7424 at 3/30/16 4:32 PM: - [~josephkb] We

[jira] [Comment Edited] (SPARK-7424) spark.ml classification, regression abstractions should add metadata to output column

2016-03-30 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218253#comment-15218253 ] Yanbo Liang edited comment on SPARK-7424 at 3/30/16 4:29 PM: - [~josephkb] We

[jira] [Issue Comment Deleted] (SPARK-7424) spark.ml classification, regression abstractions should add metadata to output column

2016-03-30 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-7424: --- Comment: was deleted (was: [~josephkb] We should copy metadata from the labelCol to the

[jira] [Commented] (SPARK-7424) spark.ml classification, regression abstractions should add metadata to output column

2016-03-30 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218253#comment-15218253 ] Yanbo Liang commented on SPARK-7424: [~josephkb] We should copy metadata from the labelCol to the

[jira] [Commented] (SPARK-7424) spark.ml classification, regression abstractions should add metadata to output column

2016-03-30 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218254#comment-15218254 ] Yanbo Liang commented on SPARK-7424: [~josephkb] We should copy metadata from the labelCol to the

[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-27 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213359#comment-15213359 ] Yanbo Liang commented on SPARK-13783: - [~GayathriMurali] Please go first, I will help to review your

[jira] [Comment Edited] (SPARK-14147) SparkR - ML predictors return features with vector datatype, however SparkR doesn't support it

2016-03-25 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211636#comment-15211636 ] Yanbo Liang edited comment on SPARK-14147 at 3/25/16 9:19 AM: -- I vote that

[jira] [Commented] (SPARK-14147) SparkR - ML predictors return features with vector datatype, however SparkR doesn't support it

2016-03-25 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211636#comment-15211636 ] Yanbo Liang commented on SPARK-14147: - I vote to not output feature column in predict, because R

[jira] [Created] (SPARK-14152) MultilayerPerceptronClassifier supports save/load for Python API

2016-03-25 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-14152: --- Summary: MultilayerPerceptronClassifier supports save/load for Python API Key: SPARK-14152 URL: https://issues.apache.org/jira/browse/SPARK-14152 Project: Spark

[jira] [Comment Edited] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211392#comment-15211392 ] Yanbo Liang edited comment on SPARK-13783 at 3/25/16 4:15 AM: --

[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211392#comment-15211392 ] Yanbo Liang commented on SPARK-13783: - GBTClassificationModel contains array of

[jira] [Comment Edited] (SPARK-14076) Naive Bayes should output attributes in predictions

2016-03-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210226#comment-15210226 ] Yanbo Liang edited comment on SPARK-14076 at 3/24/16 1:45 PM: -- [~mengxr]

[jira] [Comment Edited] (SPARK-14076) Naive Bayes should output attributes in predictions

2016-03-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210226#comment-15210226 ] Yanbo Liang edited comment on SPARK-14076 at 3/24/16 1:41 PM: -- [~mengxr]

[jira] [Comment Edited] (SPARK-14076) Naive Bayes should output attributes in predictions

2016-03-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210226#comment-15210226 ] Yanbo Liang edited comment on SPARK-14076 at 3/24/16 1:39 PM: -- [~mengxr]

[jira] [Commented] (SPARK-14076) Naive Bayes should output attributes in predictions

2016-03-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210226#comment-15210226 ] Yanbo Liang commented on SPARK-14076: - [~mengxr] Because NaiveBayesModel extends from

[jira] [Commented] (SPARK-4607) Add random seed to GBTClassifier, GBTRegressor

2016-03-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210197#comment-15210197 ] Yanbo Liang commented on SPARK-4607: Actually this has been fixed by SPARK-13952 > Add random seed to

[jira] [Commented] (SPARK-13998) HashingTF should extend UnaryTransformer

2016-03-20 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201400#comment-15201400 ] Yanbo Liang commented on SPARK-13998: - I think we should first make refactor for UnaryTransformer to

[jira] [Commented] (SPARK-13783) Model export/import for spark.ml: GBTs

2016-03-19 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201303#comment-15201303 ] Yanbo Liang commented on SPARK-13783: - Hi [~yuhaoyan], are you working on this issue? If not, I can

[jira] [Commented] (SPARK-13968) Use MurmurHash3 for hashing String features

2016-03-19 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199639#comment-15199639 ] Yanbo Liang commented on SPARK-13968: - [~mlnick] Can I work on this? > Use MurmurHash3 for hashing

[jira] [Commented] (SPARK-13968) Use MurmurHash3 for hashing String features

2016-03-18 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201401#comment-15201401 ] Yanbo Liang commented on SPARK-13968: - Sure, I will do performance comparison firstly. > Use

[jira] [Commented] (SPARK-13785) Deprecate model field in ML model summary classes

2016-03-16 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196971#comment-15196971 ] Yanbo Liang commented on SPARK-13785: - [~josephkb] Can I work on this? I vote to make model field

[jira] [Commented] (SPARK-12664) Expose raw prediction scores in MultilayerPerceptronClassificationModel

2016-03-09 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186945#comment-15186945 ] Yanbo Liang commented on SPARK-12664: - [~GayathriMurali] I'm not working on this currently, please

[jira] [Created] (SPARK-13615) GeneralizedLinearRegression support save/load

2016-03-02 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-13615: --- Summary: GeneralizedLinearRegression support save/load Key: SPARK-13615 URL: https://issues.apache.org/jira/browse/SPARK-13615 Project: Spark Issue Type:

[jira] [Created] (SPARK-13613) Provide ignored tests to export test dataset into CSV format

2016-03-01 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-13613: --- Summary: Provide ignored tests to export test dataset into CSV format Key: SPARK-13613 URL: https://issues.apache.org/jira/browse/SPARK-13613 Project: Spark

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13448: Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13448: Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13448: Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13448: Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13448: Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13448: Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13448: Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13448: Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to

[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13448: Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to

[jira] [Updated] (SPARK-13545) Make MLlib LogisticRegressionWithLBFGS's default parameters consistent in Scala and Python

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13545: Description: * The default value of regParam of PySpark MLlib LogisticRegressionWithLBFGS should

[jira] [Updated] (SPARK-13545) Make MLlib LogisticRegressionWithLBFGS's default parameters consistent in Scala and Python

2016-02-28 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13545: Summary: Make MLlib LogisticRegressionWithLBFGS's default parameters consistent in Scala and

[jira] [Created] (SPARK-13545) Make MLlib LR's default parameters consistent in Scala and Python

2016-02-28 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-13545: --- Summary: Make MLlib LR's default parameters consistent in Scala and Python Key: SPARK-13545 URL: https://issues.apache.org/jira/browse/SPARK-13545 Project: Spark

[jira] [Created] (SPARK-13504) Add approxQuantile for SparkR

2016-02-25 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-13504: --- Summary: Add approxQuantile for SparkR Key: SPARK-13504 URL: https://issues.apache.org/jira/browse/SPARK-13504 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-13490) ML LinearRegression should cache standardization param value

2016-02-25 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13490: Description: Like SPARK-13132 for LogisticRegression, LinearRegression with L1 regularization

[jira] [Created] (SPARK-13490) ML LinearRegression should cache standardization param value

2016-02-25 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-13490: --- Summary: ML LinearRegression should cache standardization param value Key: SPARK-13490 URL: https://issues.apache.org/jira/browse/SPARK-13490 Project: Spark

[jira] [Closed] (SPARK-13372) ML LogisticRegression behaves incorrectly when standardization = false && regParam = 0.0

2016-02-25 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang closed SPARK-13372. --- Resolution: Not A Bug > ML LogisticRegression behaves incorrectly when standardization = false && >

[jira] [Commented] (SPARK-13010) Survival analysis in SparkR

2016-02-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162677#comment-15162677 ] Yanbo Liang commented on SPARK-13010: - OK, we will firstly support `Surv` by hack to catch up with

[jira] [Updated] (SPARK-13429) Unify Logistic Regression convergence tolerance of ML & MLlib

2016-02-21 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13429: Description: In order to provide better and consistent result, let's change the default value of

[jira] [Updated] (SPARK-13429) Unify Logistic Regression convergence tolerance of ML & MLlib

2016-02-21 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-13429: Description: In order to provide better and consistent result, let's change the default value of

<    4   5   6   7   8   9   10   11   12   13   >