GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/18733
[SPARK-21535][ML]Reduce memory requirement for CrossValidator and
TrainValidationSplit
## What changes were proposed in this pull request?
CrossValidator and TrainValidationSplit both use
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/18728
That appears to be all right. Sending update.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/18728
Yes, that's good. But I just found there's one rule in the scala style
check
"Tests must extend org.apache.spark.SparkFunSuite instead."
try to ignore it?
---
If your p
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/18728
Thanks for your attention. @srowen
The temp dir cleanup function is implemented in trait
`DefaultReadWriteTest` which extends `TempDirectory`, not from `SparkFunSuite`.
And as you said, the
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/18313
Current implementation of `CrossValidator` (with or without this PR)
**NEVER** holds all the trained models in the driver memory at the same time.
It collects models sequentially and allows GC to
Github user hhbyyh closed the pull request at:
https://github.com/apache/spark/pull/18313
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/18728
[SPARK-21524] [ML] fix temp dir
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-21524
ValidatorParamsSuiteHelpers.testFileMove() is
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/12533
Thanks for the attention @MLnick. I'm closing most of my stale PRs. For
this one, I found all of the transformers in the PR already have `
transformSchema(dataset.schema, logging =
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/12037
Closing stale PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user hhbyyh closed the pull request at:
https://github.com/apache/spark/pull/12037
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user hhbyyh closed the pull request at:
https://github.com/apache/spark/pull/10803
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/10803
Closing stale PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user hhbyyh closed the pull request at:
https://github.com/apache/spark/pull/12533
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/12533
Close it since it's been overlooked for some time. Thanks for the review
and comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user hhbyyh closed the pull request at:
https://github.com/apache/spark/pull/11102
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/11102
Close it since it's been overlooked for some time and can be implemented
with https://github.com/apache/spark/pull/17583 easily. Thanks for the review
and comments.
---
If your project is s
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18313#discussion_r129107041
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0") (@Si
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18313#discussion_r128886371
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0") (@Si
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17461
For the initial model, I think you can just use a String param for the
model path. refer to
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/clustering
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r128043915
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r127590053
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r127589970
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r126508854
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/FeatureHasherSuite.scala ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r126503993
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r126503728
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r126505794
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18513#discussion_r126507840
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/16158
add tuning summary for crossValidator.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/16158#discussion_r125782939
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -226,6 +230,29 @@ class TrainValidationSplitModel private[ml
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17280#discussion_r125736603
--- Diff: python/pyspark/ml/fpm.py ---
@@ -186,29 +186,29 @@ class FPGrowth(JavaEstimator, HasItemsCol,
HasPredictionCol,
|[z
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Yes, Both LBFGS and OWLQN generate similar model with sklearn if without
intercept.
About replacing OWLQN with LBFGS, I noticed if using hinge loss, sometimes
OWLQN uses fewer iterations
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/16158
@MLnick Thanks for your attention. I'm not sure if SPARK-19053 is still
active and maybe it's not a blocking issue for this change. If you don't mind,
I'll extend the jira
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
@yanboliang Without intercept, sklearn and Spark LinearSVC will get the
same coefficients on several dataset I tested.
---
If your project is set up for it, you can reply to this email and have
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
On many large dataset, LinearSVC cannot get the similar result with
sklearn. e.g., SKLearn may get coefficients (5, 10, 15, 20), and spark
LinearSVC will get (10, 20, 30, 40). It's different b
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17862#discussion_r124576026
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
@@ -272,36 +272,16 @@ class LinearSVCSuite extends SparkFunSuite
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17583
This is ready for review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124138183
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/RDDLossFunction.scala ---
@@ -50,7 +50,7 @@ private[ml] class RDDLossFunction[
Agg
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124133920
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregator.scala
---
@@ -0,0 +1,364 @@
+/*
+ * Licensed to the Apache
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124135629
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
---
@@ -38,34 +40,39 @@ private[ml] trait
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18305#discussion_r124135824
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
---
@@ -38,34 +40,39 @@ private[ml] trait
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17864
Shall we pay extra attention to the Int case? E.g. input column contains
Double.Nan,
1,
2.
The current implementation will return surrogate as 1.5. I'm not sure if
it&
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17583
The error looks irrelevant.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17583
Change the constructor func parameter to UserDefinedFunction. This helps
resolve the type issue during save/load and makes it adaptable to Python.
Thanks for the suggestion from @yanboliang
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/18315
Will send further update after https://github.com/apache/spark/pull/18305
merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/18315
[SPARK-21108] [ML] [WIP] convert LinearSVC to aggregator framework
## What changes were proposed in this pull request?
convert LinearSVC to new aggregator framework
## How was this
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/18313
[SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all
models after fitting: Scala
## What changes were proposed in this pull request?
Allow `CrossValidatorModel` and
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Sure. That's reasonable. I'll move the hingeAggregator to a new PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If yo
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Merge the change from https://github.com/apache/spark/pull/17645 into a
single change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18154#discussion_r121836024
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -154,13 +155,19 @@ class CountVectorizer @Since("1.5.0"
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17645
OK. I'll close it for now and try to merge it with
https://github.com/apache/spark/pull/17862.
Thanks for the comment from @yanboliang
---
If your project is set up for it, you can rep
Github user hhbyyh closed the pull request at:
https://github.com/apache/spark/pull/17645
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17583#discussion_r121312651
--- Diff: mllib/src/main/scala/org/apache/spark/ml/FuncTransformer.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17583#discussion_r121312459
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala ---
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17583#discussion_r121183230
--- Diff: mllib/src/main/scala/org/apache/spark/ml/FuncTransformer.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17645
Hi @HyukjinKwon I think this is a feature we need, but currently we are
still having some discussion about optimizer interface.
---
If your project is set up for it, you can reply to this email and
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/18034#discussion_r117792674
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala ---
@@ -468,7 +469,16 @@ object LocalLDAModel extends Loader[LocalLDAModel
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17940#discussion_r117592116
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -992,7 +992,16 @@ object Matrices {
new DenseMatrix(dm.rows
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17940#discussion_r116351996
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -992,7 +992,24 @@ object Matrices {
new DenseMatrix(dm.rows
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17940#discussion_r116139174
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -992,7 +992,20 @@ object Matrices {
new DenseMatrix(dm.rows
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17940#discussion_r116139610
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -46,6 +46,26 @@ class MatricesSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17940#discussion_r116139038
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -992,7 +992,20 @@ object Matrices {
new DenseMatrix(dm.rows
Github user hhbyyh closed the pull request at:
https://github.com/apache/spark/pull/10466
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/10466
Close this for now until I got some time for this. We would need to
evaluate the performance and see what's the best option. Thanks for pinging
@HyukjinKwon
---
If your project is set up f
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17910
@zhengruifeng That may be the best solution I see for now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17910#discussion_r115669376
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -526,7 +526,7 @@ class LogisticRegression @Since("
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17862#discussion_r115668587
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
@@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17862#discussion_r115657829
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
@@ -223,6 +229,25 @@ class LinearSVCSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17862#discussion_r115656752
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
@@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17910#discussion_r115416085
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -526,7 +526,7 @@ class LogisticRegression @Since("
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17910#discussion_r115416204
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -890,7 +890,7 @@ object LogisticRegression extends
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17912
cc @srowen @jkbradley @felixcheung
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/17912
[SPARK-20670] [ML] Simplify FPGrowth transform
## What changes were proposed in this pull request?
As suggested by Sean Owen in https://github.com/apache/spark/pull/17130,
the transform
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17894
I'm not sure how much acceleration we can get from Level 2 BLAS. For
benchmark, we also would need to evaluate the performance for sparse data.
---
If your project is set up for it, you can
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
@debasish83 There're several approaches trying to smooth the hinge loss.
https://en.wikipedia.org/wiki/Hinge_loss. For the one you're proposing, do you
know if it's used in other SVM
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17862#discussion_r115303395
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17864
I imagine most Int features will need to be converted to Double for a
Vector, thus returns Double regardless the input type makes sense, which also
makes the implementation more straight forward
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
Update: switch to HasSolver trait and use OWLQN as default optimizer
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17862#discussion_r115050353
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17862
ping @jkbradley Sorry I know this is like the last minute for 2.2, but the
change may be important for user experience. If we're not comfortable making
API change right now, how about we just c
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/17862
[SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSVC
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-20602
Currently
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17130
@felixcheung, reverted the code change of `transform` as requested. Please
check the update. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17130#discussion_r114002856
--- Diff: docs/ml-frequent-pattern-mining.md ---
@@ -0,0 +1,87 @@
+---
+layout: global
+title: Frequent Pattern Mining
+displayTitle
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17130#discussion_r114002811
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -82,8 +81,8 @@ private[fpm] trait FPGrowthParams extends Params with
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17130#discussion_r114002784
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -268,12 +269,8 @@ class FPGrowthModel private[ml] (
val predictUDF
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17645#discussion_r113836900
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -42,15 +44,35 @@ import org.apache.spark.sql.functions.{col, lit
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17130#discussion_r113783993
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -268,12 +269,8 @@ class FPGrowthModel private[ml] (
val predictUDF
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17130#discussion_r113782786
--- Diff: docs/ml-frequent-pattern-mining.md ---
@@ -0,0 +1,87 @@
+---
+layout: global
+title: Frequent Pattern Mining
+displayTitle
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17767
Preparing a PR like this takes a lot of efforts. Please try to follow the
guidelines in http://spark.apache.org/contributing.html. (create a jira and
rename the title).
Like you said, I
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17130#discussion_r113220960
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -268,12 +269,8 @@ class FPGrowthModel private[ml] (
val predictUDF
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17130#discussion_r112278365
--- Diff: docs/ml-frequent-pattern-mining.md ---
@@ -0,0 +1,80 @@
+---
+layout: global
+title: Frequent Pattern Mining
+displayTitle
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17673
Thanks for sharing the work. To help make the review easier, I would
recommend:
1. Provide some background info.
Is the new algorithm better than the existing one and in which cases
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/17280
I'll update this after FPGrowth examples and doc merged
https://github.com/apache/spark/pull/17130, since there'll be some conflicts.
---
If your project is set up for it, you can rep
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/17654
[SPARK-20351] [ML] Add trait hasTrainingSummary to replace the duplicate
code
## What changes were proposed in this pull request?
Add a trait HasTrainingSummary to avoid code duplicate
GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/17645
[SPARK-20348] [ML] Support squared hinge loss (L2 loss) for LinearSVC
## What changes were proposed in this pull request?
While Hinge loss is the standard loss function for linear SVM
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17586#discussion_r111312845
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -287,6 +290,27 @@ class LinearSVCModel private[classification
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17586#discussion_r111313220
--- Diff: python/pyspark/ml/classification.py ---
@@ -172,6 +172,59 @@ def intercept(self):
"""
return self._call_
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/6000
@redsofa I would advise
copy the MLlib LDAOptimizer code to your own project,
add related logging to next() and just run it with your application code.
---
If your project is set up for it
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17130#discussion_r110990099
--- Diff: examples/src/main/python/ml/fpgrowth_example.py ---
@@ -0,0 +1,48 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17586#discussion_r110982571
--- Diff: python/pyspark/ml/classification.py ---
@@ -172,6 +172,47 @@ def intercept(self):
"""
return self._call_
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/17586#discussion_r110982055
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -355,6 +368,19 @@ object LinearSVCModel extends
MLReadable
201 - 300 of 974 matches
Mail list logo