[jira] [Updated] (SPARK-17961) Add storageLevel to Dataset for SparkR

2016-10-16 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-17961: --- Component/s: SQL SparkR > Add storageLevel to Dataset for SparkR >

[jira] [Updated] (SPARK-17961) Add storageLevel to Dataset for SparkR

2016-10-16 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-17961: --- Issue Type: Improvement (was: Bug) > Add storageLevel to Dataset for SparkR >

[jira] [Commented] (SPARK-17961) Add storageLevel to Dataset for SparkR

2016-10-16 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580131#comment-15580131 ] Weichen Xu commented on SPARK-17961: I am working on it and will create PR soon. > Add storageLevel

[jira] [Created] (SPARK-17961) Add storageLevel to Dataset for SparkR

2016-10-16 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17961: -- Summary: Add storageLevel to Dataset for SparkR Key: SPARK-17961 URL: https://issues.apache.org/jira/browse/SPARK-17961 Project: Spark Issue Type: Bug

[jira] [Comment Edited] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2016-10-10 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564097#comment-15564097 ] Weichen Xu edited comment on SPARK-17139 at 10/11/16 1:25 AM: -- I'm working

[jira] [Commented] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2016-10-10 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564097#comment-15564097 ] Weichen Xu commented on SPARK-17139: I'm working on it hardly and will create PR this week, thanks!

[jira] [Updated] (SPARK-17540) SparkR array serde cannot work correctly when array length == 0

2016-10-10 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-17540: --- Description: SparkR cannot handle array serde when array length == 0 when length = 0 R side set the

[jira] [Updated] (SPARK-17540) SparkR array serde cannot work correctly when array length == 0

2016-10-10 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-17540: --- Description: SparkR cannot handle array serde when array length == 0 when length = 0 R side set the

[jira] [Closed] (SPARK-17540) SparkR array serde cannot work correctly when array length == 0

2016-10-08 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu closed SPARK-17540. -- Resolution: Won't Fix > SparkR array serde cannot work correctly when array length == 0 >

[jira] [Updated] (SPARK-17745) Update Python API for NB to support weighted instances

2016-09-30 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-17745: --- Component/s: PySpark > Update Python API for NB to support weighted instances >

[jira] [Commented] (SPARK-17745) Update Python API for NB to support weighted instances

2016-09-30 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535750#comment-15535750 ] Weichen Xu commented on SPARK-17745: I will work on it and create PR ASAP, thanks! > Update Python

[jira] [Commented] (SPARK-17281) Add treeAggregateDepth parameter for AFTSurvivalRegression

2016-09-15 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492931#comment-15492931 ] Weichen Xu commented on SPARK-17281: because currently, the AFTSuvivalRegression use `treeAggregate`

[jira] [Created] (SPARK-17540) SparkR array serde cannot work correctly when array length == 0

2016-09-14 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17540: -- Summary: SparkR array serde cannot work correctly when array length == 0 Key: SPARK-17540 URL: https://issues.apache.org/jira/browse/SPARK-17540 Project: Spark

[jira] [Created] (SPARK-17507) check weight vector size in ANN

2016-09-12 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17507: -- Summary: check weight vector size in ANN Key: SPARK-17507 URL: https://issues.apache.org/jira/browse/SPARK-17507 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-17499) make the default params in sparkR spark.mlp consistent with MultilayerPerceptronClassifier

2016-09-11 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17499: -- Summary: make the default params in sparkR spark.mlp consistent with MultilayerPerceptronClassifier Key: SPARK-17499 URL: https://issues.apache.org/jira/browse/SPARK-17499

[jira] [Created] (SPARK-17390) optimize MultivariantOnlineSummerizer by making the summarized target configurable

2016-09-03 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17390: -- Summary: optimize MultivariantOnlineSummerizer by making the summarized target configurable Key: SPARK-17390 URL: https://issues.apache.org/jira/browse/SPARK-17390

[jira] [Created] (SPARK-17362) fix MultivariantOnlineSummerizer.numNonZeros

2016-09-01 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17362: -- Summary: fix MultivariantOnlineSummerizer.numNonZeros Key: SPARK-17362 URL: https://issues.apache.org/jira/browse/SPARK-17362 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-17363) fix MultivariantOnlineSummerizer.numNonZeros

2016-09-01 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17363: -- Summary: fix MultivariantOnlineSummerizer.numNonZeros Key: SPARK-17363 URL: https://issues.apache.org/jira/browse/SPARK-17363 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-17175) Add a expert formula to aggregationDepth of SharedParam

2016-09-01 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455963#comment-15455963 ] Weichen Xu commented on SPARK-17175: I will work on it, thanks! > Add a expert formula to

[jira] [Commented] (SPARK-17050) Improve initKMeansParallel with treeAggregate

2016-08-31 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452401#comment-15452401 ] Weichen Xu commented on SPARK-17050: because KMeans algo is being optimized by another task I close

[jira] [Commented] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2016-08-27 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442542#comment-15442542 ] Weichen Xu commented on SPARK-17139: Because LOR & MLOR interface need to be unified, I will create

[jira] [Created] (SPARK-17281) Add treeAggregateDepth parameter for AFTSurvivalRegression

2016-08-27 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17281: -- Summary: Add treeAggregateDepth parameter for AFTSurvivalRegression Key: SPARK-17281 URL: https://issues.apache.org/jira/browse/SPARK-17281 Project: Spark Issue

[jira] [Commented] (SPARK-17169) To use scala macros to update code when SharedParamsCodeGen.scala changed

2016-08-25 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436402#comment-15436402 ] Weichen Xu commented on SPARK-17169: I will work on it and create pr soon! > To use scala macros to

[jira] [Commented] (SPARK-17201) Investigate numerical instability for MLOR without regularization

2016-08-23 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434073#comment-15434073 ] Weichen Xu commented on SPARK-17201: yeah, you are right... I search some proof for this such as

[jira] [Comment Edited] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2016-08-18 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427519#comment-15427519 ] Weichen Xu edited comment on SPARK-17139 at 8/19/16 3:05 AM: - I will work on

[jira] [Comment Edited] (SPARK-17138) Python API for multinomial logistic regression

2016-08-18 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427518#comment-15427518 ] Weichen Xu edited comment on SPARK-17138 at 8/19/16 3:06 AM: - I will work on

[jira] [Commented] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2016-08-18 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427519#comment-15427519 ] Weichen Xu commented on SPARK-17139: I will work on it and create PR soon, thanks. > Add model

[jira] [Commented] (SPARK-17138) Python API for multinomial logistic regression

2016-08-18 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427518#comment-15427518 ] Weichen Xu commented on SPARK-17138: I will work on it and create PR soon, thanks. > Python API for

[jira] [Updated] (SPARK-16934) Update LogisticCostAggregator serialization code to make it consistent with LinearRegression

2016-08-15 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16934: --- Description: Update LogisticCostAggregator serialization code to make it consistent with

[jira] [Updated] (SPARK-16934) Update LogisticCostAggregator serialization code to make it consistent with LinearRegression

2016-08-15 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16934: --- Summary: Update LogisticCostAggregator serialization code to make it consistent with

[jira] [Created] (SPARK-17050) Improve initKMeansParallel with treeAggregate

2016-08-14 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17050: -- Summary: Improve initKMeansParallel with treeAggregate Key: SPARK-17050 URL: https://issues.apache.org/jira/browse/SPARK-17050 Project: Spark Issue Type:

[jira] [Updated] (SPARK-17046) prevent user using dataframe.select with empty param list

2016-08-13 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-17046: --- Description: currently, we can use: dataframe.select() which select nothing. it is illegal and

[jira] [Created] (SPARK-17046) prevent user using dataframe.select with empty param list

2016-08-13 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-17046: -- Summary: prevent user using dataframe.select with empty param list Key: SPARK-17046 URL: https://issues.apache.org/jira/browse/SPARK-17046 Project: Spark Issue

[jira] [Created] (SPARK-16934) Improve LogisticCostFun to avoid redundant serielization

2016-08-06 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16934: -- Summary: Improve LogisticCostFun to avoid redundant serielization Key: SPARK-16934 URL: https://issues.apache.org/jira/browse/SPARK-16934 Project: Spark Issue

[jira] [Commented] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409409#comment-15409409 ] Weichen Xu commented on SPARK-16915: I know the reason, not a bug,.. it will serialize C1 object so

[jira] [Closed] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu closed SPARK-16915. -- Resolution: Not A Bug > broadcast var cause Task not serializable exception when broadcast var is a >

[jira] [Updated] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16915: --- Description: import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD class C1(val

[jira] [Updated] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16915: --- Description: import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD class C1(val

[jira] [Updated] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16915: --- Description: import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD class C1(val

[jira] [Updated] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16915: --- Description: import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD class C1(val

[jira] [Updated] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16915: --- Description: import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD class C1(val

[jira] [Updated] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16915: --- Description: import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD class C1(val

[jira] [Updated] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16915: --- Description: --- import

[jira] [Created] (SPARK-16915) broadcast var cause Task not serializable exception when broadcast var is a class member

2016-08-05 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16915: -- Summary: broadcast var cause Task not serializable exception when broadcast var is a class member Key: SPARK-16915 URL: https://issues.apache.org/jira/browse/SPARK-16915

[jira] [Updated] (SPARK-16880) Improve ANN training, add training data persist if needed

2016-08-03 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16880: --- Component/s: MLlib ML > Improve ANN training, add training data persist if needed >

[jira] [Created] (SPARK-16880) Improve ANN training, add training data persist if needed

2016-08-03 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16880: -- Summary: Improve ANN training, add training data persist if needed Key: SPARK-16880 URL: https://issues.apache.org/jira/browse/SPARK-16880 Project: Spark Issue

[jira] [Updated] (SPARK-16835) LinearRegression LogisticRegression AFTSuvivalRegression should unpersist input training data when exception throws

2016-08-01 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16835: --- Affects Version/s: 2.1.0 2.0.1 > LinearRegression LogisticRegression

[jira] [Created] (SPARK-16835) LinearRegression LogisticRegression AFTSuvivalRegression should unpersist input training data when exception throws

2016-08-01 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16835: -- Summary: LinearRegression LogisticRegression AFTSuvivalRegression should unpersist input training data when exception throws Key: SPARK-16835 URL:

[jira] [Updated] (SPARK-16696) unused broadcast variables should call destroy instead of unpersist

2016-07-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16696: --- Issue Type: Improvement (was: Bug) > unused broadcast variables should call destroy instead of

[jira] [Updated] (SPARK-16697) redundant RDD computation in LDAOptimizer

2016-07-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16697: --- Description: In mllib.clustering.LDAOptimizer the submitMiniBatch method, the stats: RDD do not

[jira] [Created] (SPARK-16697) redundant RDD computation in LDAOptimizer

2016-07-24 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16697: -- Summary: redundant RDD computation in LDAOptimizer Key: SPARK-16697 URL: https://issues.apache.org/jira/browse/SPARK-16697 Project: Spark Issue Type:

[jira] [Updated] (SPARK-16696) unused broadcast variables should call destroy instead of unpersist

2016-07-24 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16696: --- Description: Unused broadcast variables should call destroy() instead of unpersist() so that the

[jira] [Created] (SPARK-16696) unused broadcast variables should call destroy instead of unpersist

2016-07-24 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16696: -- Summary: unused broadcast variables should call destroy instead of unpersist Key: SPARK-16696 URL: https://issues.apache.org/jira/browse/SPARK-16696 Project: Spark

[jira] [Updated] (SPARK-16662) The HiveContext deprecate warning in python always shown even if do not use HiveContext

2016-07-21 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16662: --- Issue Type: Bug (was: Improvement) > The HiveContext deprecate warning in python always shown even

[jira] [Created] (SPARK-16662) The HiveContext deprecate warning in python always shown even if do not use HiveContext

2016-07-21 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16662: -- Summary: The HiveContext deprecate warning in python always shown even if do not use HiveContext Key: SPARK-16662 URL: https://issues.apache.org/jira/browse/SPARK-16662

[jira] [Updated] (SPARK-16653) Make convergence tolerance param in ANN default value consistent with other algorithm using LBFGS

2016-07-20 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16653: --- Component/s: Optimizer ML > Make convergence tolerance param in ANN default value

[jira] [Created] (SPARK-16653) Make convergence tolerance param in ANN default value consistent with other algorithm using LBFGS

2016-07-20 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16653: -- Summary: Make convergence tolerance param in ANN default value consistent with other algorithm using LBFGS Key: SPARK-16653 URL: https://issues.apache.org/jira/browse/SPARK-16653

[jira] [Closed] (SPARK-16638) The L2 regularization of LinearRegression seems wrong when standardization is false

2016-07-19 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu closed SPARK-16638. -- Resolution: Not A Problem > The L2 regularization of LinearRegression seems wrong when standardization

[jira] [Commented] (SPARK-16638) The L2 regularization of LinearRegression seems wrong when standardization is false

2016-07-19 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385306#comment-15385306 ] Weichen Xu commented on SPARK-16638: seems i'm wrong, the intention of author may be to use w[i] /

[jira] [Updated] (SPARK-16638) The L2 regularization of LinearRegression seems wrong when standardization is false

2016-07-19 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16638: --- Description: The original L2 is 0.5 * effectiveL2regParam * sigma( wi^2 ) (wi is the coefficients we

[jira] [Created] (SPARK-16638) The L2 regularization of LinearRegression seems wrong when standardization is false

2016-07-19 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16638: -- Summary: The L2 regularization of LinearRegression seems wrong when standardization is false Key: SPARK-16638 URL: https://issues.apache.org/jira/browse/SPARK-16638

[jira] [Created] (SPARK-16600) fix latex formula syntax error in mllib

2016-07-18 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16600: -- Summary: fix latex formula syntax error in mllib Key: SPARK-16600 URL: https://issues.apache.org/jira/browse/SPARK-16600 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-16568) update sql programing guide refreshTable API

2016-07-15 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16568: -- Summary: update sql programing guide refreshTable API Key: SPARK-16568 URL: https://issues.apache.org/jira/browse/SPARK-16568 Project: Spark Issue Type:

[jira] [Updated] (SPARK-16561) Potential numerial problem in MultivariateOnlineSummarizer min/max

2016-07-15 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16561: --- Description: In `MultivariateOnlineSummarizer` min/max method, use judgement "nnz(i) < weightSum",

[jira] [Updated] (SPARK-16561) Potential numerial problem in MultivariateOnlineSummarizer min/max

2016-07-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16561: --- Description: In `MultivariateOnlineSummarizer` min/max method, use judgement "nnz(i) < weightSum",

[jira] [Updated] (SPARK-16561) Potential numerial problem in MultivariateOnlineSummarizer min/max

2016-07-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16561: --- Description: In `MultivariateOnlineSummarizer` min/max method, use judgement nnz(i) < weightSum, it

[jira] [Created] (SPARK-16561) Potential numerial problem in MultivariateOnlineSummarizer min/max

2016-07-14 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16561: -- Summary: Potential numerial problem in MultivariateOnlineSummarizer min/max Key: SPARK-16561 URL: https://issues.apache.org/jira/browse/SPARK-16561 Project: Spark

[jira] [Updated] (SPARK-16546) Dataframe.drop supported multi-columns in spark api and should make python api also support it.

2016-07-14 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16546: --- Component/s: SQL PySpark > Dataframe.drop supported multi-columns in spark api and

[jira] [Created] (SPARK-16546) Dataframe.drop supported multi-columns in spark api and should make python api also support it.

2016-07-14 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16546: -- Summary: Dataframe.drop supported multi-columns in spark api and should make python api also support it. Key: SPARK-16546 URL: https://issues.apache.org/jira/browse/SPARK-16546

[jira] [Updated] (SPARK-16500) Add LBFG training not convergence warning for all ML algorithm

2016-07-12 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16500: --- Component/s: Optimizer > Add LBFG training not convergence warning for all ML algorithm >

[jira] [Commented] (SPARK-16500) Add LBFG training not convergence warning for all ML algorithm

2016-07-12 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373062#comment-15373062 ] Weichen Xu commented on SPARK-16500: OK. I'll keep it in mind in future task. Thanks! > Add LBFG

[jira] [Created] (SPARK-16500) Add LBFG training not convergence warning for all ML algorithm

2016-07-12 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16500: -- Summary: Add LBFG training not convergence warning for all ML algorithm Key: SPARK-16500 URL: https://issues.apache.org/jira/browse/SPARK-16500 Project: Spark

[jira] [Created] (SPARK-16499) Improve applyInPlace function for matrix in ANN code

2016-07-12 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16499: -- Summary: Improve applyInPlace function for matrix in ANN code Key: SPARK-16499 URL: https://issues.apache.org/jira/browse/SPARK-16499 Project: Spark Issue Type:

[jira] [Updated] (SPARK-16470) ml.regression.LinearRegression training data do not check whether the result actually reach convergence

2016-07-10 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16470: --- Description: In `ml.regression.LinearRegression`, it use breeze `LBFGS` and `OWLQN` optimizer to do

[jira] [Created] (SPARK-16470) ml.regression.LinearRegression training data do not check whether the result actually reach convergence

2016-07-10 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16470: -- Summary: ml.regression.LinearRegression training data do not check whether the result actually reach convergence Key: SPARK-16470 URL:

[jira] [Comment Edited] (SPARK-16377) Spark MLlib: MultilayerPerceptronClassifier - error while training

2016-07-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362429#comment-15362429 ] Weichen Xu edited comment on SPARK-16377 at 7/5/16 12:49 PM: - And I test on

[jira] [Commented] (SPARK-16377) Spark MLlib: MultilayerPerceptronClassifier - error while training

2016-07-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362429#comment-15362429 ] Weichen Xu commented on SPARK-16377: And I test on master version, also encounter the following

[jira] [Commented] (SPARK-16377) Spark MLlib: MultilayerPerceptronClassifier - error while training

2016-07-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362426#comment-15362426 ] Weichen Xu commented on SPARK-16377: This exception still exists on master code. ERROR

[jira] [Commented] (SPARK-16377) Spark MLlib: MultilayerPerceptronClassifier - error while training

2016-07-05 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362419#comment-15362419 ] Weichen Xu commented on SPARK-16377: hi, the exception: java.lang.ArrayIndexOutOfBoundsException at

[jira] [Updated] (SPARK-16345) Extract graphx programming guide example snippets from source files instead of hard code them

2016-07-01 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-16345: --- Description: Currently, all example snippets in the graphx programming guide are hard-coded, which

[jira] [Created] (SPARK-16345) Extract graphx programming guide example snippets from source files instead of hard code them

2016-07-01 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-16345: -- Summary: Extract graphx programming guide example snippets from source files instead of hard code them Key: SPARK-16345 URL: https://issues.apache.org/jira/browse/SPARK-16345

[jira] [Commented] (SPARK-15874) HBase rowkey optimization support for Hbase-Storage-handler

2016-06-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326127#comment-15326127 ] Weichen Xu commented on SPARK-15874: en...I got it. But there is another problem, if I want to

[jira] [Commented] (SPARK-15874) HBase rowkey optimization support for Hbase-Storage-handler

2016-06-11 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326114#comment-15326114 ] Weichen Xu commented on SPARK-15874: The hbase connector is implements in hive and spark-SQL can use

[jira] [Updated] (SPARK-15874) HBase rowkey optimization support for Hbase-Storage-handler

2016-06-10 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-15874: --- Description: Currently, Spark-SQL use `org.apache.hadoop.hive.hbase.HBaseStorageHandler` for Hbase

[jira] [Updated] (SPARK-15874) HBase rowkey optimization support for Hbase-Storage-handler

2016-06-10 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-15874: --- Summary: HBase rowkey optimization support for Hbase-Storage-handler (was: HBase rowkey

[jira] [Commented] (SPARK-15874) HBase rowkey optimization support for Hbase-handler

2016-06-10 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324385#comment-15324385 ] Weichen Xu commented on SPARK-15874: [~rxin]What do you think about it ? > HBase rowkey optimization

[jira] [Created] (SPARK-15874) HBase rowkey optimization support for Hbase-handler

2016-06-10 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-15874: -- Summary: HBase rowkey optimization support for Hbase-handler Key: SPARK-15874 URL: https://issues.apache.org/jira/browse/SPARK-15874 Project: Spark Issue Type:

[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-06-09 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322082#comment-15322082 ] Weichen Xu commented on SPARK-15086: OK. [~srowen] What do you think about it? > Update Java API

[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-06-09 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322074#comment-15322074 ] Weichen Xu commented on SPARK-15086: If do so, only rename the java API in this type or rename scala

[jira] [Commented] (SPARK-15837) PySpark ML Word2Vec should support maxSentenceLength

2016-06-08 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321899#comment-15321899 ] Weichen Xu commented on SPARK-15837: I'll work on it and create a PR soon ! > PySpark ML Word2Vec

[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-06-08 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320428#comment-15320428 ] Weichen Xu commented on SPARK-15086: So, if considering java API compatibility with old version, the

[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-06-08 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320351#comment-15320351 ] Weichen Xu commented on SPARK-15086: I think the Java API should be the same to scala API if

[jira] [Updated] (SPARK-15820) Add Catalog.refreshTable into python API

2016-06-08 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-15820: --- Summary: Add Catalog.refreshTable into python API (was: Add spark-SQL Catalog.refreshTable into

[jira] [Updated] (SPARK-15820) Add spark-SQL Catalog.refreshTable into python api

2016-06-08 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-15820: --- External issue ID: (was: SPARK-15367) > Add spark-SQL Catalog.refreshTable into python api >

[jira] [Updated] (SPARK-15820) Add spark-SQL Catalog.refreshTable into python api

2016-06-08 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-15820: --- Description: The Catalog.refreshTable API is missing in python interface for Spark-SQL, add it.

[jira] [Created] (SPARK-15820) Add spark-SQL Catalog.refreshTable into python api

2016-06-08 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-15820: -- Summary: Add spark-SQL Catalog.refreshTable into python api Key: SPARK-15820 URL: https://issues.apache.org/jira/browse/SPARK-15820 Project: Spark Issue Type:

[jira] [Created] (SPARK-15805) update the whole sql programming guide

2016-06-07 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-15805: -- Summary: update the whole sql programming guide Key: SPARK-15805 URL: https://issues.apache.org/jira/browse/SPARK-15805 Project: Spark Issue Type: Improvement

[jira] [Closed] (SPARK-15212) CSV file reader when read file with first line schema do not filter blank in schema column name

2016-06-01 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu closed SPARK-15212. -- Resolution: Won't Fix > CSV file reader when read file with first line schema do not filter blank in

[jira] [Created] (SPARK-15702) Update document programming-guide accumulator section

2016-06-01 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-15702: -- Summary: Update document programming-guide accumulator section Key: SPARK-15702 URL: https://issues.apache.org/jira/browse/SPARK-15702 Project: Spark Issue

[jira] [Commented] (SPARK-15670) Add deprecate annotation for acumulator V1 interface in JavaSparkContext class

2016-06-01 Thread Weichen Xu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309379#comment-15309379 ] Weichen Xu commented on SPARK-15670: OK, I'll follow SPARK-15086 jira, thanks! > Add deprecate

<    1   2   3   4   5   6   7   >