spark git commit: [SPARK-21693][R][ML] Reduce max iterations in Linear SVM test in R to speed up AppVeyor build

felixcheung Sun, 12 Nov 2017 14:38:00 -0800

Repository: spark
Updated Branches:
  refs/heads/master 9bf696dbe -> 3d90b2cb3



[SPARK-21693][R][ML] Reduce max iterations in Linear SVM test in R to speed up 
AppVeyor build

## What changes were proposed in this pull request?

This PR proposes to reduce max iteration in Linear SVM test in SparkR. This 
particular test elapses roughly 5 mins on my Mac and over 20 mins on Windows.

The root cause appears, it triggers 2500ish jobs by the default 100 max 
iterations. In Linux, `daemon.R` is forked but on Windows another process is 
launched, which is extremely slow.

So, given my observation, there are many processes (not forked) ran on Windows, 
which makes the differences of elapsed time.

After reducing the max iteration to 10, the total jobs in this single test is 
reduced to 550ish.

After reducing the max iteration to 5, the total jobs in this single test is 
reduced to 360ish.

## How was this patch tested?

Manually tested the elapsed times.

Author: hyukjinkwon <gurwls...@gmail.com>

Closes #19722 from HyukjinKwon/SPARK-21693-test.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3d90b2cb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3d90b2cb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3d90b2cb

Branch: refs/heads/master
Commit: 3d90b2cb384affe8ceac9398615e9e21b8c8e0b0
Parents: 9bf696d
Author: hyukjinkwon <gurwls...@gmail.com>
Authored: Sun Nov 12 14:37:20 2017 -0800
Committer: Felix Cheung <felixche...@apache.org>
Committed: Sun Nov 12 14:37:20 2017 -0800

----------------------------------------------------------------------
 R/pkg/tests/fulltests/test_mllib_classification.R | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/3d90b2cb/R/pkg/tests/fulltests/test_mllib_classification.R
----------------------------------------------------------------------
diff --git a/R/pkg/tests/fulltests/test_mllib_classification.R 
b/R/pkg/tests/fulltests/test_mllib_classification.R
index a4d0397..ad47717 100644
--- a/R/pkg/tests/fulltests/test_mllib_classification.R
+++ b/R/pkg/tests/fulltests/test_mllib_classification.R
@@ -66,7 +66,7 @@ test_that("spark.svmLinear", {
   feature <- c(1.1419053, 0.9194079, -0.9498666, -1.1069903, 0.2809776)
   data <- as.data.frame(cbind(label, feature))
   df <- createDataFrame(data)
-  model <- spark.svmLinear(df, label ~ feature, regParam = 0.1)
+  model <- spark.svmLinear(df, label ~ feature, regParam = 0.1, maxIter = 5)
   prediction <- collect(select(predict(model, df), "prediction"))
   expect_equal(sort(prediction$prediction), c("0.0", "0.0", "0.0", "1.0", 
"1.0"))
 
@@ -77,10 +77,11 @@ test_that("spark.svmLinear", {
   trainidxs <- base::sample(nrow(data), nrow(data) * 0.7)
   traindf <- as.DataFrame(data[trainidxs, ])
   testdf <- as.DataFrame(rbind(data[-trainidxs, ], c(0, "the other")))
-  model <- spark.svmLinear(traindf, clicked ~ ., regParam = 0.1)
+  model <- spark.svmLinear(traindf, clicked ~ ., regParam = 0.1, maxIter = 5)
   predictions <- predict(model, testdf)
   expect_error(collect(predictions))
-  model <- spark.svmLinear(traindf, clicked ~ ., regParam = 0.1, handleInvalid 
= "skip")
+  model <- spark.svmLinear(traindf, clicked ~ ., regParam = 0.1,
+                           handleInvalid = "skip", maxIter = 5)
   predictions <- predict(model, testdf)
   expect_equal(class(collect(predictions)$clicked[1]), "list")
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21693][R][ML] Reduce max iterations in Linear SVM test in R to speed up AppVeyor build

Reply via email to