[jira] [Commented] (SPARK-19141) VectorAssembler metadata causing memory issues

2017-09-21 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174685#comment-16174685 ] Weichen Xu commented on SPARK-19141: Maybe we need design a sparse format of Attribut

[jira] [Commented] (SPARK-22061) Add pipeline model of SVM

2017-09-20 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172983#comment-16172983 ] Weichen Xu commented on SPARK-22061: We already have `LinearSVC` and implemented by L

[jira] [Commented] (SPARK-21972) Allow users to control input data persistence in ML Estimators via a handlePersistence ml.Param

2017-09-19 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171773#comment-16171773 ] Weichen Xu commented on SPARK-21972: [~podongfeng] [~Siddharth Murching] I have a sim

[jira] [Created] (SPARK-22060) CrossValidator/TrainValidationSplit parallelism param persist/load bug

2017-09-19 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-22060: -- Summary: CrossValidator/TrainValidationSplit parallelism param persist/load bug Key: SPARK-22060 URL: https://issues.apache.org/jira/browse/SPARK-22060 Project: Spark

[jira] [Comment Edited] (SPARK-21802) Make sparkR MLP summary() expose probability column

2017-09-17 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169535#comment-16169535 ] Weichen Xu edited comment on SPARK-21802 at 9/18/17 3:42 AM: -

[jira] [Commented] (SPARK-21802) Make sparkR MLP summary() expose probability column

2017-09-17 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169535#comment-16169535 ] Weichen Xu commented on SPARK-21802: [~felixcheung] The probability cannot be added t

[jira] [Updated] (SPARK-22005) CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Python API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-22005: --- Description: In pyspark: We add a parameter indicating whether to persist models to disk during train

[jira] [Commented] (SPARK-22005) CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Python API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165882#comment-16165882 ] Weichen Xu commented on SPARK-22005: I will create PR once SPARK-21088 merged. >

[jira] [Updated] (SPARK-22005) CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Python API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-22005: --- Issue Type: Sub-task (was: New Feature) Parent: SPARK-21086 > CrossValidator, TrainVal

[jira] [Created] (SPARK-22005) CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Python API

2017-09-14 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-22005: -- Summary: CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Python API Key: SPARK-22005 URL: https://issues.apache.org/jira/browse/SPARK-22005 P

[jira] [Commented] (SPARK-21088) CrossValidator, TrainValidationSplit should collect all models when fitting: Python API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165867#comment-16165867 ] Weichen Xu commented on SPARK-21088: [~dimberman] Because [~ajaysaini] is busy, I tak

[jira] [Updated] (SPARK-21088) CrossValidator, TrainValidationSplit should collect all models when fitting: Python API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21088: --- Summary: CrossValidator, TrainValidationSplit should collect all models when fitting: Python API (wa

[jira] [Updated] (SPARK-21088) CrossValidator, TrainValidationSplit should collect all models when fitting: Python API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21088: --- Description: In pyspark: We add a parameter whether to collect the full model list when CrossValidat

[jira] [Updated] (SPARK-21087) CrossValidator, TrainValidationSplit should collect all models when fitting: Scala API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21087: --- Summary: CrossValidator, TrainValidationSplit should collect all models when fitting: Scala API (was

[jira] [Updated] (SPARK-21087) CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21087: --- Summary: CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala API (w

[jira] [Updated] (SPARK-21087) CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21087: --- Description: We add a parameter whether to collect the full model list when CrossValidator/TrainVali

[jira] [Commented] (SPARK-22004) CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Scala API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165853#comment-16165853 ] Weichen Xu commented on SPARK-22004: I will create PR once SPARK-21086 merged. > Cro

[jira] [Updated] (SPARK-22004) CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Scala API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-22004: --- Description: We add a parameter indicating whether to persist models to disk during training (defaul

[jira] [Updated] (SPARK-22004) CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Scala API

2017-09-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-22004: --- Issue Type: Sub-task (was: New Feature) Parent: SPARK-21086 > CrossValidator, TrainValidatio

[jira] [Created] (SPARK-22004) CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Scala API

2017-09-14 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-22004: -- Summary: CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Scala API Key: SPARK-22004 URL: https://issues.apache.org/jira/browse/SPARK-22004 Proj

[jira] [Closed] (SPARK-21802) Make sparkR MLP summary() expose probability column

2017-09-09 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu closed SPARK-21802. -- Resolution: Not A Problem > Make sparkR MLP summary() expose probability column > -

[jira] [Updated] (SPARK-21911) Parallel Model Evaluation for ML Tuning: PySpark

2017-09-04 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21911: --- Summary: Parallel Model Evaluation for ML Tuning: PySpark (was: Parallel Model Evaluation for ML Tun

[jira] [Updated] (SPARK-19357) Parallel Model Evaluation for ML Tuning: Scala

2017-09-04 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-19357: --- Summary: Parallel Model Evaluation for ML Tuning: Scala (was: Parallel Model Evaluation for ML Tunin

[jira] [Created] (SPARK-21911) Parallel Model Evaluation for ML Tuning: Python

2017-09-04 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21911: -- Summary: Parallel Model Evaluation for ML Tuning: Python Key: SPARK-21911 URL: https://issues.apache.org/jira/browse/SPARK-21911 Project: Spark Issue Type: New F

[jira] [Updated] (SPARK-21898) Feature parity for KolmogorovSmirnovTest in MLlib

2017-09-02 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21898: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-4591 > Feature parity for KolmogorovSmirnovTes

[jira] [Created] (SPARK-21898) Feature parity for KolmogorovSmirnovTest in MLlib

2017-09-02 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21898: -- Summary: Feature parity for KolmogorovSmirnovTest in MLlib Key: SPARK-21898 URL: https://issues.apache.org/jira/browse/SPARK-21898 Project: Spark Issue Type: Bug

[jira] [Comment Edited] (SPARK-21802) Make sparkR MLP summary() expose probability column

2017-08-31 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136234#comment-16136234 ] Weichen Xu edited comment on SPARK-21802 at 8/31/17 1:48 PM: -

[jira] [Created] (SPARK-21862) Add overflow check in PCA

2017-08-29 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21862: -- Summary: Add overflow check in PCA Key: SPARK-21862 URL: https://issues.apache.org/jira/browse/SPARK-21862 Project: Spark Issue Type: Improvement Compo

[jira] [Created] (SPARK-21856) Update Python API for MultilayerPerceptronClassifierModel

2017-08-28 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21856: -- Summary: Update Python API for MultilayerPerceptronClassifierModel Key: SPARK-21856 URL: https://issues.apache.org/jira/browse/SPARK-21856 Project: Spark Issue T

[jira] [Created] (SPARK-21854) Python interface for MLOR summary

2017-08-28 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21854: -- Summary: Python interface for MLOR summary Key: SPARK-21854 URL: https://issues.apache.org/jira/browse/SPARK-21854 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-08-25 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141319#comment-16141319 ] Weichen Xu commented on SPARK-21799: [~zahili] hmm..You're right. We are hard to get

[jira] [Issue Comment Deleted] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-08-25 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21799: --- Comment: was deleted (was: I suggest check both `df.storageLevel` and `df.rdd.getStorageLevel` for i

[jira] [Commented] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-08-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140162#comment-16140162 ] Weichen Xu commented on SPARK-21799: I suggest check both `df.storageLevel` and `df.r

[jira] [Commented] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-08-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139802#comment-16139802 ] Weichen Xu commented on SPARK-21799: [~Siddharth Murching] Already have another jira

[jira] [Commented] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-08-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139799#comment-16139799 ] Weichen Xu commented on SPARK-21799: [~Siddharth Murching] +1 This will cause double

[jira] [Commented] (SPARK-21770) ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions

2017-08-23 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139364#comment-16139364 ] Weichen Xu commented on SPARK-21770: Hmm... `normalizeToProbabilitiesInPlace` is only

[jira] [Created] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result

2017-08-23 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21818: -- Summary: MultivariateOnlineSummarizer.variance generate negative result Key: SPARK-21818 URL: https://issues.apache.org/jira/browse/SPARK-21818 Project: Spark I

[jira] [Commented] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-22 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137946#comment-16137946 ] Weichen Xu commented on SPARK-21729: I will work on this, thanks! > Generic test for

[jira] [Comment Edited] (SPARK-21802) Make sparkR MLP summary() expose probability column

2017-08-21 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136234#comment-16136234 ] Weichen Xu edited comment on SPARK-21802 at 8/22/17 4:25 AM: -

[jira] [Commented] (SPARK-21802) Make sparkR MLP summary() expose probability column

2017-08-21 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136234#comment-16136234 ] Weichen Xu commented on SPARK-21802: cc [~felixcheung] > Make sparkR MLP summary() e

[jira] [Created] (SPARK-21802) Make sparkR MLP summary() expose probability column

2017-08-21 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21802: -- Summary: Make sparkR MLP summary() expose probability column Key: SPARK-21802 URL: https://issues.apache.org/jira/browse/SPARK-21802 Project: Spark Issue Type: N

[jira] [Created] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-21 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21801: -- Summary: SparkR unit test randomly fail on trees Key: SPARK-21801 URL: https://issues.apache.org/jira/browse/SPARK-21801 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-21 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136229#comment-16136229 ] Weichen Xu commented on SPARK-21801: cc [~felixcheung] Can you help fix this ? > Spa

[jira] [Commented] (SPARK-21741) Python API for DataFrame-based multivariate summarizer

2017-08-15 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128226#comment-16128226 ] Weichen Xu commented on SPARK-21741: OK I will work on this. I will post a design doc

[jira] [Updated] (SPARK-21681) MLOR do not work correctly when featureStd contains zero

2017-08-15 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21681: --- Description: MLOR do not work correctly when featureStd contains zero. We can reproduce the bug throu

[jira] [Created] (SPARK-21681) MLOR do not work correctly when featureStd contains zero

2017-08-09 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21681: -- Summary: MLOR do not work correctly when featureStd contains zero Key: SPARK-21681 URL: https://issues.apache.org/jira/browse/SPARK-21681 Project: Spark Issue Ty

[jira] [Commented] (SPARK-20418) multi-label classification support

2017-07-26 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102280#comment-16102280 ] Weichen Xu commented on SPARK-20418: I will work on this. > multi-label classificati

[jira] [Commented] (SPARK-11215) Add multiple columns support to StringIndexer

2017-07-26 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102274#comment-16102274 ] Weichen Xu commented on SPARK-11215: I will take over this feature and create a PR so

[jira] [Commented] (SPARK-21087) CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala

2017-07-26 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102048#comment-16102048 ] Weichen Xu commented on SPARK-21087: I will work on it. > CrossValidator, TrainValid

[jira] [Commented] (SPARK-17025) Cannot persist PySpark ML Pipeline model that includes custom Transformer

2017-07-26 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101994#comment-16101994 ] Weichen Xu commented on SPARK-17025: Because currently, scala calling python will be

[jira] [Commented] (SPARK-21523) Fix bug of strong wolfe linesearch `init` parameter lose effectiveness

2017-07-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099225#comment-16099225 ] Weichen Xu commented on SPARK-21523: I will work on this once the breeze cut a new ve

[jira] [Updated] (SPARK-21523) Fix bug of strong wolfe linesearch `init` parameter lose effectiveness

2017-07-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-21523: --- Priority: Minor (was: Major) > Fix bug of strong wolfe linesearch `init` parameter lose effectivenes

[jira] [Created] (SPARK-21523) Fix bug of strong wolfe linesearch `init` parameter lose effectiveness

2017-07-24 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21523: -- Summary: Fix bug of strong wolfe linesearch `init` parameter lose effectiveness Key: SPARK-21523 URL: https://issues.apache.org/jira/browse/SPARK-21523 Project: Spark

[jira] [Updated] (SPARK-20504) ML 2.2 QA: API: Java compatibility, docs

2017-05-17 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-20504: --- Attachment: (updated)signature.diff (updated)process_script2.sh (updat

[jira] [Issue Comment Deleted] (SPARK-20504) ML 2.2 QA: API: Java compatibility, docs

2017-05-15 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-20504: --- Comment: was deleted (was: You’re right this is really a headache. Java tools cannot extract several

[jira] [Updated] (SPARK-20504) ML 2.2 QA: API: Java compatibility, docs

2017-05-15 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-20504: --- You’re right this is really a headache. Java tools cannot extract several information `scalac` generated

[jira] [Comment Edited] (SPARK-20504) ML 2.2 QA: API: Java compatibility, docs

2017-05-12 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008898#comment-16008898 ] Weichen Xu edited comment on SPARK-20504 at 5/12/17 11:30 PM: -

[jira] [Updated] (SPARK-20504) ML 2.2 QA: API: Java compatibility, docs

2017-05-12 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-20504: --- Attachment: 5_added_ml_class 4_common_ml_class 3_added_class_signature

[jira] [Commented] (SPARK-20504) ML 2.2 QA: API: Java compatibility, docs

2017-05-12 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008898#comment-16008898 ] Weichen Xu commented on SPARK-20504: I have already taken the following steps to chec

[jira] [Created] (SPARK-20423) fix MLOR coeffs centering when reg == 0

2017-04-20 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-20423: -- Summary: fix MLOR coeffs centering when reg == 0 Key: SPARK-20423 URL: https://issues.apache.org/jira/browse/SPARK-20423 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-19215) Add necessary check for `RDD.checkpoint` to avoid potential mistakes

2017-01-13 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-19215: -- Summary: Add necessary check for `RDD.checkpoint` to avoid potential mistakes Key: SPARK-19215 URL: https://issues.apache.org/jira/browse/SPARK-19215 Project: Spark

[jira] [Updated] (SPARK-19189) Optimize CartesianRDD to avoid parent RDD partition re-computation and re-serialization

2017-01-13 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-19189: --- Summary: Optimize CartesianRDD to avoid parent RDD partition re-computation and re-serialization (wa

[jira] [Updated] (SPARK-19189) Optimize CartesianRDD to avoid partition re-computation and re-serialization

2017-01-13 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-19189: --- Priority: Minor (was: Major) > Optimize CartesianRDD to avoid partition re-computation and re-serial

[jira] [Updated] (SPARK-19190) Optimize CartesianRDD to avoid partition re-computation and re-serialization

2017-01-13 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-19190: --- Issue Type: Improvement (was: Bug) > Optimize CartesianRDD to avoid partition re-computation and re-

[jira] [Updated] (SPARK-19190) Optimize CartesianRDD to avoid partition re-computation and re-serialization

2017-01-13 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-19190: --- Priority: Minor (was: Major) > Optimize CartesianRDD to avoid partition re-computation and re-serial

[jira] [Updated] (SPARK-19203) Optimize CartesianRDD to avoid partition re-computation and re-serialization

2017-01-13 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-19203: --- Priority: Minor (was: Major) > Optimize CartesianRDD to avoid partition re-computation and re-serial

[jira] [Updated] (SPARK-19189) Optimize CartesianRDD to avoid partition re-computation and re-serialization

2017-01-13 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-19189: --- Issue Type: Improvement (was: Bug) > Optimize CartesianRDD to avoid partition re-computation and re-

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819964#comment-15819964 ] Weichen Xu edited comment on SPARK-10078 at 1/12/17 4:59 AM: -

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819964#comment-15819964 ] Weichen Xu edited comment on SPARK-10078 at 1/12/17 4:58 AM: -

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820180#comment-15820180 ] Weichen Xu edited comment on SPARK-10078 at 1/12/17 4:54 AM: -

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820180#comment-15820180 ] Weichen Xu edited comment on SPARK-10078 at 1/12/17 4:54 AM: -

[jira] [Commented] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820180#comment-15820180 ] Weichen Xu commented on SPARK-10078: As the detail problems I list above(I only list

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819851#comment-15819851 ] Weichen Xu edited comment on SPARK-10078 at 1/12/17 4:43 AM: -

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819964#comment-15819964 ] Weichen Xu edited comment on SPARK-10078 at 1/12/17 3:02 AM: -

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819964#comment-15819964 ] Weichen Xu edited comment on SPARK-10078 at 1/12/17 2:55 AM: -

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819964#comment-15819964 ] Weichen Xu edited comment on SPARK-10078 at 1/12/17 2:48 AM: -

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819964#comment-15819964 ] Weichen Xu edited comment on SPARK-10078 at 1/12/17 2:45 AM: -

[jira] [Commented] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819964#comment-15819964 ] Weichen Xu commented on SPARK-10078: [~debasish83] But when we implement VF-LBFGS/VF-

[jira] [Commented] (SPARK-10078) Vector-free L-BFGS

2017-01-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819851#comment-15819851 ] Weichen Xu commented on SPARK-10078: [~debasish83] Can L-BFGS-B be distributed comput

[jira] [Commented] (SPARK-18036) Decision Trees do not handle edge cases

2016-12-20 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766243#comment-15766243 ] Weichen Xu commented on SPARK-18036: Oh, I'm too busy recently to work on it, it woul

[jira] [Issue Comment Deleted] (SPARK-18036) Decision Trees do not handle edge cases

2016-12-20 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-18036: --- Comment: was deleted (was: i am working on this... ) > Decision Trees do not handle edge cases > ---

[jira] [Commented] (SPARK-18286) Add Scala/Java/Python examples for MinHash and RandomProjection

2016-11-04 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15638806#comment-15638806 ] Weichen Xu commented on SPARK-18286: I will work on it, thanks~ > Add Scala/Java/Pyt

[jira] [Updated] (SPARK-18218) Optimize BlockMatrix multiplication, which may cause OOM and low parallelism usage problem in several cases

2016-11-02 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-18218: --- Issue Type: Improvement (was: Bug) > Optimize BlockMatrix multiplication, which may cause OOM and lo

[jira] [Created] (SPARK-18218) Optimize BlockMatrix multiplication, which may cause OOM and low parallelism usage problem in several cases

2016-11-02 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-18218: -- Summary: Optimize BlockMatrix multiplication, which may cause OOM and low parallelism usage problem in several cases Key: SPARK-18218 URL: https://issues.apache.org/jira/browse/SPARK-

[jira] [Closed] (SPARK-18201) add toDense and toSparse into Matrix trait, like Vector design

2016-11-01 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu closed SPARK-18201. -- Resolution: Duplicate It will fix in this PR https://github.com/apache/spark/pull/15628 > add toDense

[jira] [Created] (SPARK-18201) add toDense and toSparse into Matrix trait, like Vector design

2016-11-01 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-18201: -- Summary: add toDense and toSparse into Matrix trait, like Vector design Key: SPARK-18201 URL: https://issues.apache.org/jira/browse/SPARK-18201 Project: Spark I

[jira] [Commented] (SPARK-18036) Decision Trees do not handle edge cases

2016-10-25 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607357#comment-15607357 ] Weichen Xu commented on SPARK-18036: i am working on this... > Decision Trees do no

[jira] [Issue Comment Deleted] (SPARK-18095) There is a display problem in spark UI storage tab when rdd was persisted in multiple replicas

2016-10-25 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-18095: --- Comment: was deleted (was: I am working on it...) > There is a display problem in spark UI storage t

[jira] [Commented] (SPARK-18095) There is a display problem in spark UI storage tab when rdd was persisted in multiple replicas

2016-10-25 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605712#comment-15605712 ] Weichen Xu commented on SPARK-18095: I am working on it... > There is a display prob

[jira] [Updated] (SPARK-18095) There is a display problem in spark UI storage tab when rdd was persisted in multiple replicas

2016-10-25 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-18095: --- Description: There is a display problem in spark UI storage tab when rdd was persisted in multiple r

[jira] [Created] (SPARK-18095) There is a display problem in spark UI storage tab when rdd was persisted in multiple replicas

2016-10-25 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-18095: -- Summary: There is a display problem in spark UI storage tab when rdd was persisted in multiple replicas Key: SPARK-18095 URL: https://issues.apache.org/jira/browse/SPARK-18095

[jira] [Updated] (SPARK-18078) Add option for customize zipPartition task preferred locations

2016-10-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-18078: --- Priority: Minor (was: Major) > Add option for customize zipPartition task preferred locations >

[jira] [Updated] (SPARK-18078) Add option for customize zipPartition task preferred locations

2016-10-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-18078: --- Description: `RDD.zipPartitions` task preferred locations strategy will use the intersection of corr

[jira] [Updated] (SPARK-18078) Add option for customize zipPartition task preferred locations

2016-10-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-18078: --- Description: `RDD.zipPartitions` task preferred locations strategy will use the intersection of corr

[jira] [Created] (SPARK-18078) Add option for customize zipPartition task preferred locations

2016-10-24 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-18078: -- Summary: Add option for customize zipPartition task preferred locations Key: SPARK-18078 URL: https://issues.apache.org/jira/browse/SPARK-18078 Project: Spark I

[jira] [Created] (SPARK-18051) Custom PartitionCoalescer cause serialization exception

2016-10-21 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-18051: -- Summary: Custom PartitionCoalescer cause serialization exception Key: SPARK-18051 URL: https://issues.apache.org/jira/browse/SPARK-18051 Project: Spark Issue Typ

[jira] [Created] (SPARK-18007) update SparkR MLP - add initalWeights parameter

2016-10-19 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-18007: -- Summary: update SparkR MLP - add initalWeights parameter Key: SPARK-18007 URL: https://issues.apache.org/jira/browse/SPARK-18007 Project: Spark Issue Type: Impro

[jira] [Updated] (SPARK-18003) RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records.

2016-10-18 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-18003: --- Description: RDD zipWithIndex generate wrong result when one partition contains more than Int.MaxVal

[jira] [Updated] (SPARK-18003) RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records.

2016-10-18 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-18003: --- Component/s: Spark Core > RDD zipWithIndex generate wrong result when one partition contains more tha

[jira] [Created] (SPARK-18003) RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records.

2016-10-18 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-18003: -- Summary: RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records. Key: SPARK-18003 URL: https://issues.apache.org/jira/browse/SPARK-18003

<    1   2   3   4   5   6   7   >