[GitHub] spark pull request #14629: [SPARK-17046][SQL] prevent_user_call_df_select_wi...

2016-08-13 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14629 [SPARK-17046][SQL] prevent_user_call_df_select_will_empty_paramlist ## What changes were proposed in this pull request? We can see the DataFrame API: `def select(col: String, cols

[GitHub] spark pull request #14628: [SPARK-17033][Follow-up][ML][MLLib] Improve kmean...

2016-08-13 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14628 [SPARK-17033][Follow-up][ML][MLLib] Improve kmean aggregate to treeAggregate ## What changes were proposed in this pull request? The kmean use `aggregate` to compute points cost

[GitHub] spark issue #14629: [WIP][SPARK-17046][SQL] prevent user using dataframe.sel...

2016-08-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14629 @srowen How do you think about this problem? I found adding two method like `def select(cols: Column*)` `def select(col: Column, cols: Column*)` causing ambiguous, I

[GitHub] spark issue #14628: [SPARK-17033][Follow-up][ML][MLLib] Improve kmean aggreg...

2016-08-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14628 @lins05 Ok, give me some time to check whether the one in LDAModel is also proper to use treeAggregate --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #14265: [PySpark] add picklable SparseMatrix in pyspark.ml.commo...

2016-07-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14265 cc @rxin Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14293: [GIT] add pydev & Rstudio project file to gitigno...

2016-07-20 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14293 [GIT] add pydev & Rstudio project file to gitignore list ## What changes were proposed in this pull request? Add Pydev & Rstudio project file to gitignore list, I think the

[GitHub] spark pull request #13275: [SPARK-15499][PySpark][Tests] Add python testsuit...

2016-07-20 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13275 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14293: [GIT] add pydev & Rstudio project file to gitignore list

2016-07-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14293 I use PyDev IDE to edit python code and it generate `.pydevproject`, and use Rstudio IDE to edit R code it generate *.Rproj, these are only projects setting files used by the IDEs like `.idea

[GitHub] spark issue #14265: [PySpark] add picklable SparseMatrix in pyspark.ml.commo...

2016-07-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14265 @srowen I check the ml.python.MLSerde and it support SparseMatrix pickler and at python side the SparseMatrix constructor also match the pickler. So I think the `_picklable_classes

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14216 @srowen several minor modifications done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14301: [SPARK-16662][PySpark][SQL] update HiveContext wa...

2016-07-21 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14301 [SPARK-16662][PySpark][SQL] update HiveContext warning ## What changes were proposed in this pull request? move the `HiveContext` deprecate warning printing statement

[GitHub] spark issue #14265: [PySpark] add picklable SparseMatrix in pyspark.ml.commo...

2016-07-21 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14265 @srowen I guess the `_picklable_classes` list in `ml.linalg.common` is copied from `mllib.linalg.common` so it forgot to add the `SparseMatrix` which is added later. --- If your project

[GitHub] spark pull request #14238: [MINOR][TYPO] fix fininsh typo

2016-07-17 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14238 [MINOR][TYPO] fix fininsh typo ## What changes were proposed in this pull request? fininsh => finish ## How was this patch tested? (Please explain how this pa

[GitHub] spark pull request #14122: [SPARK-16470][ML][Optimizer] Check linear regress...

2016-07-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14122#discussion_r71083700 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -327,6 +327,11 @@ class LinearRegression @Since("

[GitHub] spark pull request #14220: [SPARK-16568][SQL][Documentation] update sql prog...

2016-07-15 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14220 [SPARK-16568][SQL][Documentation] update sql programming guide refreshTable API in python code ## What changes were proposed in this pull request? update `refreshTable` API in python

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14216 @srowen OK var names updated. and the 'fixing' numNonzero which you said means the number of input vectors which weight > 0 ? --- If your project is set up for it, you can re

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14216 @srowen OK. I'll fix the var names first. nnz => weightSum weightSum => totalWeightSum cnnz => nnz is that right ? --- If your project is set up for it, you

[GitHub] spark pull request #14265: [PySpark] add picklable SparseMatrix

2016-07-19 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14265 [PySpark] add picklable SparseMatrix ## What changes were proposed in this pull request? add `SparseMatrix` class whick support pickler. ## How was this patch tested

[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...

2016-07-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14276 cc @srowen Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14276: [WIP][SPARK-16638][ML][Optimizer] fix L2 reg comp...

2016-07-19 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/14276 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14220: [SPARK-16568][SQL][Documentation] update sql programming...

2016-07-19 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14220 cc @rxin Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computati...

2016-07-19 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14276 [SPARK-16638][ML][Optimizer] fix L2 reg computation in linearRegression when standarlization is false ## What changes were proposed in this pull request? when `standardization

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14216 @srowen Now I add testcase, I test 3 cases, they are the same with the example cases I wrote in [SPARK-16561], thanks! --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #14276: [WIP][SPARK-16638][ML][Optimizer] fix L2 reg computation...

2016-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14276 @srowen I re-think the code again and maybe my previous idea is wrong. The intension of author may be to use w[i] / featuresStd[i] to reduce penalty on large scale dimension (because

[GitHub] spark pull request #14286: [SPARK-16653][ML][Optimizer] update ANN convergen...

2016-07-20 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14286 [SPARK-16653][ML][Optimizer] update ANN convergence tolerance param default to 1e-6 ## What changes were proposed in this pull request? replace ANN convergence tolerance param

[GitHub] spark pull request #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace fun...

2016-07-12 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14156 [SPARK-16499][ML][MLLib] improve ApplyInPlace function in ANN code ## What changes were proposed in this pull request? I re-code the following fuction using breeze's matrix operating

[GitHub] spark pull request #14157: [SPARK-16500][ML][MLLib][Optimizer] add LBFGS con...

2016-07-12 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14157 [SPARK-16500][ML][MLLib][Optimizer] add LBFGS convergence warning for all used place in MLLib ## What changes were proposed in this pull request? Add warning_for the following case

[GitHub] spark pull request #14246: [SPARK-16600][MLLib] fix some latex formula synta...

2016-07-18 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14246 [SPARK-16600][MLLib] fix some latex formula syntax error ## What changes were proposed in this pull request? `\partial\x` ==> `\partial x` `har{x_i}` ==> `h

[GitHub] spark issue #14220: [SPARK-16568][SQL][Documentation] update sql programming...

2016-07-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14220 cc @liancheng Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14203: [SPARK-16546][SQL][PySpark] update python datafra...

2016-07-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14203#discussion_r70913944 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1416,13 +1416,25 @@ def drop(self, col): >>> df.join(df2, df.name ==

[GitHub] spark pull request #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary mi...

2016-07-14 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14216 [SPARK-16561][MLLib] fix multivarOnlineSummary min/max bug ## What changes were proposed in this pull request? add a member vector `cnnz` to count each dimensions non-zero value

[GitHub] spark pull request #13946: [MINOR][SparkR] update sparkR DataFrame.R comment

2016-06-28 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13946 [MINOR][SparkR] update sparkR DataFrame.R comment ## What changes were proposed in this pull request? update sparkR DataFrame.R comment SQLContext ==> SparkSess

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-27 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 cc @liancheng Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14025: [DOC][SQL] update out-of-date code snippets using...

2016-07-06 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14025#discussion_r69720553 --- Diff: docs/streaming-programming-guide.md --- @@ -1546,9 +1546,9 @@ val words: DStream[String] = ... words.foreachRDD { rdd

[GitHub] spark pull request #14121: [MINOR][ML] update comment where is inconsistent ...

2016-07-09 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14121 [MINOR][ML] update comment where is inconsistent with code in ml.regression.LinearRegression ## What changes were proposed in this pull request? In `train` method

[GitHub] spark pull request #14122: [SPARK-16470][ML][Optimizer] Check linear regress...

2016-07-10 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14122 [SPARK-16470][ML][Optimizer] Check linear regression training whether actually reach convergence and add warning if not ## What changes were proposed in this pull request

[GitHub] spark pull request #14122: [SPARK-16470][ML][Optimizer] Check linear regress...

2016-07-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14122#discussion_r70181416 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -327,6 +327,11 @@ class LinearRegression @Since("

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14156 @srowen OK I close the pr for now if I found better way to optimize it I will reopen it, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace fun...

2016-08-04 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/14156 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurviv...

2016-08-06 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14519#discussion_r73787877 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala --- @@ -583,19 +591,22 @@ private class AFTAggregator

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 cc @sethah @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 Oh..its another algorithm and there are several different details so in order to make it clear I create a separated PR to discuss it , thanks! --- If your project is set up for it, you can

[GitHub] spark pull request #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun ...

2016-08-06 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14520 [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoid redundant serielization ## What changes were proposed in this pull request? Improve LogisticCostFun, replace closure var

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 @MLnick The main improvement here is about `localFeaturesStd`, in previous code, each calling on `CostFun.calculate` will do a serialization and broadcast on vector. mark

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 @sethah Thanks for your careful review! The PR here already passing the bcFeaturesStd and bcCoeffs as constructor args to the `LogisticAggregator`, like your PR #14109 You

[GitHub] spark pull request #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurviv...

2016-08-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14519#discussion_r74011434 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala --- @@ -478,21 +482,23 @@ object AFTSurvivalRegressionModel

[GitHub] spark pull request #14440: [SPARK-16835][ML] add training data unpersist han...

2016-08-01 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/14440 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14440: [SPARK-16835][ML] add training data unpersist handling w...

2016-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14440 sounds reasonable... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14015: [SPARK-16345][Documentation][Examples][GraphX] Ex...

2016-07-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14015 [SPARK-16345][Documentation][Examples][GraphX] Extract graphx programming guide example snippets from source files instead of hard code them ## What changes were proposed in this pull request

[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...

2016-07-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14010 Jenkins retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...

2016-07-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14010 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14010: [GRAPHX][EXAMPLES] move graphx test data director...

2016-07-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14010 [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document ## What changes were proposed in this pull request? There are two test data for graphx examples which

[GitHub] spark pull request #13136: [SPARK-15350][mllib]add unit test function for Lo...

2016-07-01 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13136 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...

2016-07-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14015 @srowen Yes, the example code is exactly the same as those in graphx doc, and I test them all, can run normally. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...

2016-07-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14010 @srowen Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...

2016-07-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14015 Merge conflicts have been solved. cc @srowen Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14025: [WIP][DOC] update out-of-date code snippets using SQLCon...

2016-07-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14025 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14025: [WIP][DOC] update out-of-date code snippets using SQLCon...

2016-07-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14025 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14025: [DOC][SQL] update out-of-date code snippets using SQLCon...

2016-07-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14025 @liancheng Yes. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14025: [DOC][SQL] update out-of-date code snippets using...

2016-07-04 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14025#discussion_r69417590 --- Diff: docs/configuration.md --- @@ -1564,8 +1564,8 @@ spark.sql("SET -v").show(n=200, truncate=False) {% h

[GitHub] spark pull request #14025: [WIP][DOC] update out-of-date code snippets using...

2016-07-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14025 [WIP][DOC] update out-of-date code snippets using SQLContext in all documents. ## What changes were proposed in this pull request? I search the whole documents directory using

[GitHub] spark issue #14628: [SPARK-17050][ML][MLLib] Improve kmean rdd.aggregate to ...

2016-08-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14628 @holdenk I think depth (2) is enough to handle large RDD and bigger depth may add cost. I'll append test result later. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah If I merge the MulticlassLogisticRegressionSummary into LogisticRegressionSummary, then, according to the hierarchy currently designed, it became: class

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah About your new design, ``` Summary PredictionSummary extends Summary ClassificationSummary extends PredictionSummary ProbabilisticClassificationSummary

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 I read jkbradley's thoughts here, so I will modify this as following: first we need 4 traits, using the following hierarchy: LogisticRegressionSummary

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 sethah About this issue: Why is there a one-to-one overlap between MulticlassClassificationSummary and LogisticRegressionSummary, and MulticlassLogisticRegressionSummary inherits

[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data size of Bl...

2017-01-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15730 @brkyvz Also thanks for your careful code review! ^_^ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-17 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 cc @sethah @jkbradley Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16576: [SPARK-19215] Add necessary check for `RDD.checkp...

2017-01-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/16576#discussion_r96573963 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1539,6 +1539,9 @@ abstract class RDD[T: ClassTag]( // NOTE: we use

[GitHub] spark pull request #14629: [WIP][SPARK-17046][SQL] prevent user using datafr...

2016-08-22 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/14629 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14156 cc @srowen thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14922: [WIP][SPARK-17175][ML][MLLib] Add a expert formul...

2016-09-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14922#discussion_r77352565 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -295,6 +295,13 @@ class LogisticRegression @Since

[GitHub] spark pull request #14922: [WIP][SPARK-17175][ML][MLLib] Add a expert formul...

2016-09-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14922#discussion_r77354917 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -405,5 +405,9 @@ private[ml] trait HasAggregationDepth

[GitHub] spark pull request #14923: [SPARK-17363][ML][MLLib] fix MultivariantOnlineSu...

2016-09-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14923#discussion_r77355534 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala --- @@ -231,9 +231,9 @@ class

[GitHub] spark pull request #14922: [WIP][SPARK-17175][ML][MLLib] Add a expert formul...

2016-09-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14922#discussion_r77352634 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -295,6 +295,13 @@ class LogisticRegression @Since

[GitHub] spark pull request #14922: [WIP][SPARK-17175][ML][MLLib] Add a expert formul...

2016-09-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14922#discussion_r77358835 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -295,6 +295,13 @@ class LogisticRegression @Since

[GitHub] spark pull request #14923: [SPARK-17363][ML][MLLib] fix MultivariantOnlineSu...

2016-09-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14923#discussion_r77352136 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala --- @@ -233,7 +233,7 @@ class

[GitHub] spark pull request #14922: [WIP][SPARK-17175][ML][MLLib] Add a expert formul...

2016-09-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14922#discussion_r77362198 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -405,5 +405,9 @@ private[ml] trait HasAggregationDepth

[GitHub] spark pull request #14922: [WIP][SPARK-17175][ML][MLLib] Add a expert formul...

2016-09-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14922#discussion_r77356253 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -295,6 +295,13 @@ class LogisticRegression @Since

[GitHub] spark pull request #14923: [SPARK-17363][ML][MLLib] fix MultivariantOnlineSu...

2016-09-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14923#discussion_r77361651 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala --- @@ -231,9 +231,9 @@ class

[GitHub] spark issue #14950: [SPARK-17390][ML][MLLib] Optimize MultivariantOnlineSumm...

2016-09-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14950 @srowen not only cpu cost, if data dimension is big, serialization cost will be big, such as https://github.com/apache/spark/pull/14109 and compute all target seems not proper if we may add

[GitHub] spark pull request #14950: [SPARK-17390][ML][MLLib] Optimize MultivariantOnl...

2016-09-03 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14950 [SPARK-17390][ML][MLLib] Optimize MultivariantOnlineSummerizer by making the summarized target configurable ## What changes were proposed in this pull request? add a mask parameter

[GitHub] spark pull request #14922: [WIP][SPARK-17175] Add a expert formula to aggreg...

2016-09-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14922 [WIP][SPARK-17175] Add a expert formula to aggregationDepth of SharedParam ## What changes were proposed in this pull request? Add a expert formula to aggregationDepth of SharedParam

[GitHub] spark pull request #14923: [SPARK-17363][ML][MLLib] fix MultivariantOnlineSu...

2016-09-01 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14923 [SPARK-17363][ML][MLLib] fix MultivariantOnlineSummerizer.numNonZeros ## What changes were proposed in this pull request? fix `MultivariantOnlineSummerizer.numNonZeros` method

[GitHub] spark issue #14628: [SPARK-17050][ML][MLLib] Improve kmean rdd.aggregate to ...

2016-08-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14628 because KMeans algo is being optimized by another task I close this PR for now and when that one merged I'll check for whether this need to be optimized. --- If your project is set up

[GitHub] spark issue #14898: [SPARK-16499][ML][MLLib] optimize ann algorithm where us...

2016-08-31 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14898 cc @srowen thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14898: [SPARK-16499][ML][MLLib] optimize ann algorithm w...

2016-08-31 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14898 [SPARK-16499][ML][MLLib] optimize ann algorithm where using ApplyInPlace function ## What changes were proposed in this pull request? replace `ApplyInPlace(output, target, delta

[GitHub] spark pull request #14628: [SPARK-17050][ML][MLLib] Improve kmean rdd.aggreg...

2016-08-31 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/14628 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15045: [Spark Core][MINOR] fix "default partitioner cannot part...

2016-09-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15045 jenkins test please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15051: [SPARK-17499][ML][MLLib] make the default params ...

2016-09-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15051#discussion_r78309230 --- Diff: R/pkg/R/mllib.R --- @@ -694,8 +694,8 @@ setMethod("predict", signature(object = "KMeansModel"), #' } #' @note s

[GitHub] spark pull request #15051: [SPARK-17499][ML][MLLib] make the default params ...

2016-09-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15051#discussion_r78307552 --- Diff: R/pkg/R/mllib.R --- @@ -694,8 +694,8 @@ setMethod("predict", signature(object = "KMeansModel"), #' } #' @note s

[GitHub] spark pull request #15051: [SPARK-17499][ML][MLLib] make the default params ...

2016-09-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15051#discussion_r78308116 --- Diff: R/pkg/R/mllib.R --- @@ -694,8 +694,8 @@ setMethod("predict", signature(object = "KMeansModel"), #' } #' @note s

[GitHub] spark pull request #15051: [SPARK-17499][ML][MLLib] make the default params ...

2016-09-11 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/15051 [SPARK-17499][ML][MLLib] make the default params in sparkR spark.mlp consistent with MultilayerPerceptronClassifier ## What changes were proposed in this pull request? update several

[GitHub] spark issue #15045: [Spark Core][MINOR] fix "default partitioner cannot part...

2016-09-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15045 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15051: [SPARK-17499][ML][MLLib] make the default params ...

2016-09-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15051#discussion_r78315909 --- Diff: R/pkg/R/mllib.R --- @@ -694,8 +694,8 @@ setMethod("predict", signature(object = "KMeansModel"), #' } #' @note s

[GitHub] spark pull request #15051: [SPARK-17499][ML][MLLib] make the default params ...

2016-09-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15051#discussion_r78315763 --- Diff: R/pkg/R/mllib.R --- @@ -694,8 +694,8 @@ setMethod("predict", signature(object = "KMeansModel"), #' } #' @note s

[GitHub] spark pull request #15060: [SPARK-17507][ML][MLLib] check weight vector size...

2016-09-14 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15060#discussion_r78749771 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala --- @@ -235,6 +235,7 @@ class

[GitHub] spark pull request #15097: [SPARK-17540][SparkR][Spark Core] fix SparkR arra...

2016-09-14 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/15097 [SPARK-17540][SparkR][Spark Core] fix SparkR array serde type problem when length == 0 ## What changes were proposed in this pull request? fix SparkR array serde type problem when

[GitHub] spark issue #15045: [Spark Core][MINOR] fix partitionBy error message

2016-09-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15045 oh, there are 5 similar messages.. I check the others, the others may be set the default one, so I update their message as "Specified or default partitioner..." b

[GitHub] spark pull request #15045: [Spark Core][MINOR] fix partitionBy error message

2016-09-10 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/15045 [Spark Core][MINOR] fix partitionBy error message ## What changes were proposed in this pull request? In order to avoid confusing user, it is better to change

<    1   2   3   4   5   6   7   8   9   10   >