Github user petro-rudenko closed the pull request at:
https://github.com/apache/spark/pull/5510
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/5510#issuecomment-94419332
For my case it means:
```scala
(new ParamGridBuilder).addGrid(lr.regParam, Array(0.1)) == (lr.regParam=0.1
&& new ParamGridBuild
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/5510#issuecomment-94418041
For my case i can live with default behaviour. It's just not intuitive that
empty ParamGridBuilder returns array of size 1 and also not clear how to handle
j
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/5510#issuecomment-93412249
Ideally crossvalidator should handle next cases:
1) No parameters at all: just run est.fit(dataset, new ParamMap)
2) 1 param: set this param to estimator
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/5510#issuecomment-93373411
Maybe in Crossvalidator handle empty estimatorParamMap?
```scala
/** @group setParam */
def setEstimatorParamMaps(value: Array[ParamMap]): this.type
Github user petro-rudenko commented on a diff in the pull request:
https://github.com/apache/spark/pull/5510#discussion_r28339279
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/ParamGridBuilder.scala ---
@@ -100,10 +100,11 @@ class ParamGridBuilder {
* Builds
GitHub user petro-rudenko opened a pull request:
https://github.com/apache/spark/pull/5510
[SPARK-6901][Ml]ParamGridBuilder.build with no grids should return an emty
array
ParamGridBuilder.build with no grids returns array with an empty param map.
```scala
assert((new
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/1909#issuecomment-90063723
+1 for this. Useful feature to calculate distributed cumulative sum.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user petro-rudenko commented on a diff in the pull request:
https://github.com/apache/spark/pull/5196#discussion_r27739585
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software
Github user petro-rudenko commented on a diff in the pull request:
https://github.com/apache/spark/pull/5196#discussion_r27645880
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software
Github user petro-rudenko commented on a diff in the pull request:
https://github.com/apache/spark/pull/4735#discussion_r27510186
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/LabelIndexer.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software
Github user petro-rudenko commented on a diff in the pull request:
https://github.com/apache/spark/pull/4735#discussion_r27486767
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/LabelIndexer.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software
Github user petro-rudenko commented on a diff in the pull request:
https://github.com/apache/spark/pull/4735#discussion_r27399968
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/LabelIndexer.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/5265#issuecomment-87670835
+1 for this, since for example [the caching logic from ml
package](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml
Github user petro-rudenko commented on a diff in the pull request:
https://github.com/apache/spark/pull/5135#discussion_r27043852
--- Diff: docs/ml-guide.md ---
@@ -655,6 +660,36 @@ import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.Row;
import
GitHub user petro-rudenko opened a pull request:
https://github.com/apache/spark/pull/5135
[ML][docs][minor] Define LabeledDocument/Document classes in CV example
To easier copy/paste Cross-Validation example code snippet need to define
LabeledDocument/Document in it, since they
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/4514#issuecomment-75994711
Thanks, works now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/4514#issuecomment-75989874
Having problem compiling spark with sbt due to next error:
```
$ build/sbt -Phadoop-2.4 compile
[error]
/home/peter/soft/spark_src/core/src/main/scala
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/4593#issuecomment-75550855
@dbtsai, @joshdevins here's an issue i have. I'm using new ml pipeline
with hyperparameter grid search. Because folds doesn't depend from
hyper
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/3637#issuecomment-74563955
@jkbradley i can setValidateData in GLM, but not in the LogisticRegression
class from the new API. For my case found a trick to customize anything i want
(add
GitHub user petro-rudenko opened a pull request:
https://github.com/apache/spark/pull/4595
[Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop
On a big dataset explicitly unpersist train and validation folds allows to
load more data into memory in the next loop
GitHub user petro-rudenko opened a pull request:
https://github.com/apache/spark/pull/4590
[Ml] SPARK-5796 Don't transform data on a last estimator in Pipeline
If it's a last estimator in Pipeline there's no need to transform data,
since there's no next stag
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/3637#issuecomment-73509087
One more issue. In LogisticRegressionWithLBFGS class there's a line:
```scala
this.setFeatureScaling(true)
```
I have feature scaling as a
Github user petro-rudenko commented on the pull request:
https://github.com/apache/spark/pull/3637#issuecomment-71636977
Also would be nice to be able to get/set model state:
```scala
// Run cross-validation, and choose the best set of parameters.
val cvModel = crossval.fit
24 matches
Mail list logo