According to the documentation, cvModel.avgMetrics gives average
cross-validation metrics for each paramMap in
CrossValidator.estimatorParamMaps,
in the corresponding order.

So when using areaUnderROC as the evaluator, cvModel.avgMetrics gives (in
this example using scala, but API appears to work the same in pyspark)

Array(0.8706074097889074, 0.9409529716549123, 0.9618787730606256,
0.8838019837612303, 0.9397610587835981, 0.9591275359721634,
0.8829088978012987, 0.9394137261180164, 0.9584085992609841,
0.8706074097889079, 0.9628051240960216, 0.9827490959747656,
0.8838019837612294, 0.9636100965080932, 0.9826906885021736,
0.8829088978013016, 0.9627072956991051, 0.9809166441709806,
0.8508340706851226, 0.7325352788119097, 0.7208072472539231,
0.8553496724213554, 0.7354481892254211, 0.7251511314439787,
0.8546551939595262, 0.7358349987841173, 0.7251408416244391)

My understanding is that each value is the average areaUnderROC across the
folds for each Parameter Grid combination used during Cross Validation.

I have yet to figure out what "in the corresponding order" specifically
means, so don't know which areaUnderROC value corresponds to which set of
hyperparameters.

Hopefully this is what you are looking for.


On Thu, Sep 29, 2016 at 3:18 PM, evanzamir <zamir.e...@gmail.com> wrote:

> I'm using CrossValidator (in PySpark) to create a logistic regression
> model.
> There is "areaUnderROC", which I assume gives the AUC for the bestModel
> chosen by CV. But how to get the areaUnderROC for the test data during the
> cross-validation?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Is-there-a-way-to-get-the-AUC-
> metric-for-CrossValidator-tp27816.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to