http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-evaluation-metrics.html ---------------------------------------------------------------------- diff --git a/site/docs/2.1.0/mllib-evaluation-metrics.html b/site/docs/2.1.0/mllib-evaluation-metrics.html index 4bc636d..0d5bb3b 100644 --- a/site/docs/2.1.0/mllib-evaluation-metrics.html +++ b/site/docs/2.1.0/mllib-evaluation-metrics.html @@ -307,20 +307,20 @@ <ul id="markdown-toc"> - <li><a href="#classification-model-evaluation" id="markdown-toc-classification-model-evaluation">Classification model evaluation</a> <ul> - <li><a href="#binary-classification" id="markdown-toc-binary-classification">Binary classification</a> <ul> - <li><a href="#threshold-tuning" id="markdown-toc-threshold-tuning">Threshold tuning</a></li> + <li><a href="#classification-model-evaluation">Classification model evaluation</a> <ul> + <li><a href="#binary-classification">Binary classification</a> <ul> + <li><a href="#threshold-tuning">Threshold tuning</a></li> </ul> </li> - <li><a href="#multiclass-classification" id="markdown-toc-multiclass-classification">Multiclass classification</a> <ul> - <li><a href="#label-based-metrics" id="markdown-toc-label-based-metrics">Label based metrics</a></li> + <li><a href="#multiclass-classification">Multiclass classification</a> <ul> + <li><a href="#label-based-metrics">Label based metrics</a></li> </ul> </li> - <li><a href="#multilabel-classification" id="markdown-toc-multilabel-classification">Multilabel classification</a></li> - <li><a href="#ranking-systems" id="markdown-toc-ranking-systems">Ranking systems</a></li> + <li><a href="#multilabel-classification">Multilabel classification</a></li> + <li><a href="#ranking-systems">Ranking systems</a></li> </ul> </li> - <li><a href="#regression-model-evaluation" id="markdown-toc-regression-model-evaluation">Regression model evaluation</a></li> + <li><a href="#regression-model-evaluation">Regression model evaluation</a></li> </ul> <p><code>spark.mllib</code> comes with a number of machine learning algorithms that can be used to learn from and make predictions @@ -421,7 +421,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation <div data-lang="scala"> <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS"><code>LogisticRegressionWithLBFGS</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.BinaryClassificationMetrics"><code>BinaryClassificationMetrics</code> Scala docs</a> for details on the API.</p> - <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS</span> + <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS</span> <span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.BinaryClassificationMetrics</span> <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span> <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span> @@ -453,13 +453,13 @@ data, and evaluate the performance of the algorithm by several binary evaluation <span class="c1">// Precision by threshold</span> <span class="k">val</span> <span class="n">precision</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precisionByThreshold</span> <span class="n">precision</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">t</span><span class="o">,</span> <span class="n">p</span><span class="o">)</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Threshold: $t, Precision: $p"</span><span class="o">)</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"Threshold: </span><span class="si">$t</span><span class="s">, Precision: </span><span class="si">$p</span><span class="s">"</span><span class="o">)</span> <span class="o">}</span> <span class="c1">// Recall by threshold</span> <span class="k">val</span> <span class="n">recall</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recallByThreshold</span> <span class="n">recall</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">t</span><span class="o">,</span> <span class="n">r</span><span class="o">)</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Threshold: $t, Recall: $r"</span><span class="o">)</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"Threshold: </span><span class="si">$t</span><span class="s">, Recall: </span><span class="si">$r</span><span class="s">"</span><span class="o">)</span> <span class="o">}</span> <span class="c1">// Precision-Recall Curve</span> @@ -468,13 +468,13 @@ data, and evaluate the performance of the algorithm by several binary evaluation <span class="c1">// F-measure</span> <span class="k">val</span> <span class="n">f1Score</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasureByThreshold</span> <span class="n">f1Score</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">t</span><span class="o">,</span> <span class="n">f</span><span class="o">)</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Threshold: $t, F-score: $f, Beta = 1"</span><span class="o">)</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"Threshold: </span><span class="si">$t</span><span class="s">, F-score: </span><span class="si">$f</span><span class="s">, Beta = 1"</span><span class="o">)</span> <span class="o">}</span> <span class="k">val</span> <span class="n">beta</span> <span class="k">=</span> <span class="mf">0.5</span> <span class="k">val</span> <span class="n">fScore</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasureByThreshold</span><span class="o">(</span><span class="n">beta</span><span class="o">)</span> <span class="n">f1Score</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">t</span><span class="o">,</span> <span class="n">f</span><span class="o">)</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Threshold: $t, F-score: $f, Beta = 0.5"</span><span class="o">)</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"Threshold: </span><span class="si">$t</span><span class="s">, F-score: </span><span class="si">$f</span><span class="s">, Beta = 0.5"</span><span class="o">)</span> <span class="o">}</span> <span class="c1">// AUPRC</span> @@ -498,7 +498,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation <div data-lang="java"> <p>Refer to the <a href="api/java/org/apache/spark/mllib/classification/LogisticRegressionModel.html"><code>LogisticRegressionModel</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/classification/LogisticRegressionWithLBFGS.html"><code>LogisticRegressionWithLBFGS</code> Java docs</a> for details on the API.</p> - <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span> + <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.*</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span> @@ -518,7 +518,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation <span class="n">JavaRDD</span><span class="o"><</span><span class="n">LabeledPoint</span><span class="o">></span> <span class="n">test</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span> <span class="c1">// Run training algorithm to build the model.</span> -<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegressionWithLBFGS</span><span class="o">()</span> +<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">()</span> <span class="o">.</span><span class="na">setNumClasses</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span> <span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">training</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> @@ -538,7 +538,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation <span class="c1">// Get evaluation metrics.</span> <span class="n">BinaryClassificationMetrics</span> <span class="n">metrics</span> <span class="o">=</span> - <span class="k">new</span> <span class="nf">BinaryClassificationMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> + <span class="k">new</span> <span class="n">BinaryClassificationMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> <span class="c1">// Precision by threshold</span> <span class="n">JavaRDD</span><span class="o"><</span><span class="n">Tuple2</span><span class="o"><</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">>></span> <span class="n">precision</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="na">precisionByThreshold</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">();</span> @@ -564,7 +564,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation <span class="k">new</span> <span class="n">Function</span><span class="o"><</span><span class="n">Tuple2</span><span class="o"><</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">>,</span> <span class="n">Double</span><span class="o">>()</span> <span class="o">{</span> <span class="nd">@Override</span> <span class="kd">public</span> <span class="n">Double</span> <span class="nf">call</span><span class="o">(</span><span class="n">Tuple2</span><span class="o"><</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">></span> <span class="n">t</span><span class="o">)</span> <span class="o">{</span> - <span class="k">return</span> <span class="k">new</span> <span class="nf">Double</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_1</span><span class="o">().</span><span class="na">toString</span><span class="o">());</span> + <span class="k">return</span> <span class="k">new</span> <span class="n">Double</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_1</span><span class="o">().</span><span class="na">toString</span><span class="o">());</span> <span class="o">}</span> <span class="o">}</span> <span class="o">);</span> @@ -590,34 +590,34 @@ data, and evaluate the performance of the algorithm by several binary evaluation <div data-lang="python"> <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.BinaryClassificationMetrics"><code>BinaryClassificationMetrics</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.classification.LogisticRegressionWithLBFGS"><code>LogisticRegressionWithLBFGS</code> Python docs</a> for more details on the API.</p> - <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span> + <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span> <span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">BinaryClassificationMetrics</span> <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span> -<span class="c"># Several of the methods available in scala are currently missing from pyspark</span> -<span class="c"># Load training data in LIBSVM format</span> +<span class="c1"># Several of the methods available in scala are currently missing from pyspark</span> +<span class="c1"># Load training data in LIBSVM format</span> <span class="n">data</span> <span class="o">=</span> <span class="n">spark</span>\ - <span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">"libsvm"</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">"data/mllib/sample_binary_classification_data.txt"</span><span class="p">)</span>\ + <span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">"libsvm"</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"data/mllib/sample_binary_classification_data.txt"</span><span class="p">)</span>\ <span class="o">.</span><span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">row</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span> -<span class="c"># Split data into training (60%) and test (40%)</span> +<span class="c1"># Split data into training (60%) and test (40%)</span> <span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span> <span class="n">seed</span><span class="o">=</span><span class="mi">11</span><span class="p">)</span> <span class="n">training</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span> -<span class="c"># Run training algorithm to build the model</span> +<span class="c1"># Run training algorithm to build the model</span> <span class="n">model</span> <span class="o">=</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">training</span><span class="p">)</span> -<span class="c"># Compute raw scores on the test set</span> +<span class="c1"># Compute raw scores on the test set</span> <span class="n">predictionAndLabels</span> <span class="o">=</span> <span class="n">test</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="p">)),</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">))</span> -<span class="c"># Instantiate metrics object</span> +<span class="c1"># Instantiate metrics object</span> <span class="n">metrics</span> <span class="o">=</span> <span class="n">BinaryClassificationMetrics</span><span class="p">(</span><span class="n">predictionAndLabels</span><span class="p">)</span> -<span class="c"># Area under precision-recall curve</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Area under PR = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">areaUnderPR</span><span class="p">)</span> +<span class="c1"># Area under precision-recall curve</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Area under PR = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">areaUnderPR</span><span class="p">)</span> -<span class="c"># Area under ROC curve</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Area under ROC = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">areaUnderROC</span><span class="p">)</span> +<span class="c1"># Area under ROC curve</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Area under ROC = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">areaUnderROC</span><span class="p">)</span> </pre></div> <div><small>Find full example code at "examples/src/main/python/mllib/binary_classification_metrics_example.py" in the Spark repo.</small></div> </div> @@ -649,15 +649,15 @@ correctly normalized by the number of times that label appears in the output.</p <p>Define the class, or label, set as</p> -<script type="math/tex; mode=display">L = \{\ell_0, \ell_1, \ldots, \ell_{M-1} \}</script> +<script type="math/tex; mode=display">L = \{\ell_0, \ell_1, \ldots, \ell_{M-1} \} </script> <p>The true output vector $\mathbf{y}$ consists of $N$ elements</p> -<script type="math/tex; mode=display">\mathbf{y}_0, \mathbf{y}_1, \ldots, \mathbf{y}_{N-1} \in L</script> +<script type="math/tex; mode=display">\mathbf{y}_0, \mathbf{y}_1, \ldots, \mathbf{y}_{N-1} \in L </script> <p>A multiclass prediction algorithm generates a prediction vector $\hat{\mathbf{y}}$ of $N$ elements</p> -<script type="math/tex; mode=display">\hat{\mathbf{y}}_0, \hat{\mathbf{y}}_1, \ldots, \hat{\mathbf{y}}_{N-1} \in L</script> +<script type="math/tex; mode=display">\hat{\mathbf{y}}_0, \hat{\mathbf{y}}_1, \ldots, \hat{\mathbf{y}}_{N-1} \in L </script> <p>For this section, a modified delta function $\hat{\delta}(x)$ will prove useful</p> @@ -731,7 +731,7 @@ the data, and evaluate the performance of the algorithm by several multiclass cl <div data-lang="scala"> <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.MulticlassMetrics"><code>MulticlassMetrics</code> Scala docs</a> for details on the API.</p> - <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS</span> + <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS</span> <span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.MulticlassMetrics</span> <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span> <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span> @@ -764,34 +764,34 @@ the data, and evaluate the performance of the algorithm by several multiclass cl <span class="c1">// Overall Statistics</span> <span class="k">val</span> <span class="n">accuracy</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">accuracy</span> <span class="n">println</span><span class="o">(</span><span class="s">"Summary Statistics"</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Accuracy = $accuracy"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Accuracy = </span><span class="si">$accuracy</span><span class="s">"</span><span class="o">)</span> <span class="c1">// Precision by label</span> <span class="k">val</span> <span class="n">labels</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">labels</span> <span class="n">labels</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">l</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Precision($l) = "</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="o">(</span><span class="n">l</span><span class="o">))</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"Precision(</span><span class="si">$l</span><span class="s">) = "</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="o">(</span><span class="n">l</span><span class="o">))</span> <span class="o">}</span> <span class="c1">// Recall by label</span> <span class="n">labels</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">l</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Recall($l) = "</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="o">(</span><span class="n">l</span><span class="o">))</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"Recall(</span><span class="si">$l</span><span class="s">) = "</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="o">(</span><span class="n">l</span><span class="o">))</span> <span class="o">}</span> <span class="c1">// False positive rate by label</span> <span class="n">labels</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">l</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"FPR($l) = "</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">falsePositiveRate</span><span class="o">(</span><span class="n">l</span><span class="o">))</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"FPR(</span><span class="si">$l</span><span class="s">) = "</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">falsePositiveRate</span><span class="o">(</span><span class="n">l</span><span class="o">))</span> <span class="o">}</span> <span class="c1">// F-measure by label</span> <span class="n">labels</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">l</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"F1-Score($l) = "</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="o">(</span><span class="n">l</span><span class="o">))</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"F1-Score(</span><span class="si">$l</span><span class="s">) = "</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="o">(</span><span class="n">l</span><span class="o">))</span> <span class="o">}</span> <span class="c1">// Weighted stats</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Weighted precision: ${metrics.weightedPrecision}"</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Weighted recall: ${metrics.weightedRecall}"</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Weighted F1 score: ${metrics.weightedFMeasure}"</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Weighted false positive rate: ${metrics.weightedFalsePositiveRate}"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Weighted precision: </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">weightedPrecision</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Weighted recall: </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">weightedRecall</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Weighted F1 score: </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Weighted false positive rate: </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">weightedFalsePositiveRate</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> </pre></div> <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala" in the Spark repo.</small></div> @@ -800,7 +800,7 @@ the data, and evaluate the performance of the algorithm by several multiclass cl <div data-lang="java"> <p>Refer to the <a href="api/java/org/apache/spark/mllib/evaluation/MulticlassMetrics.html"><code>MulticlassMetrics</code> Java docs</a> for details on the API.</p> - <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span> + <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.*</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span> @@ -820,7 +820,7 @@ the data, and evaluate the performance of the algorithm by several multiclass cl <span class="n">JavaRDD</span><span class="o"><</span><span class="n">LabeledPoint</span><span class="o">></span> <span class="n">test</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span> <span class="c1">// Run training algorithm to build the model.</span> -<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegressionWithLBFGS</span><span class="o">()</span> +<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">()</span> <span class="o">.</span><span class="na">setNumClasses</span><span class="o">(</span><span class="mi">3</span><span class="o">)</span> <span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">training</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> @@ -835,7 +835,7 @@ the data, and evaluate the performance of the algorithm by several multiclass cl <span class="o">);</span> <span class="c1">// Get evaluation metrics.</span> -<span class="n">MulticlassMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">MulticlassMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> +<span class="n">MulticlassMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MulticlassMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> <span class="c1">// Confusion matrix</span> <span class="n">Matrix</span> <span class="n">confusion</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="na">confusionMatrix</span><span class="o">();</span> @@ -872,48 +872,48 @@ the data, and evaluate the performance of the algorithm by several multiclass cl <div data-lang="python"> <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.MulticlassMetrics"><code>MulticlassMetrics</code> Python docs</a> for more details on the API.</p> - <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span> + <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span> <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span> <span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">MulticlassMetrics</span> -<span class="c"># Load training data in LIBSVM format</span> -<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">"data/mllib/sample_multiclass_classification_data.txt"</span><span class="p">)</span> +<span class="c1"># Load training data in LIBSVM format</span> +<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">"data/mllib/sample_multiclass_classification_data.txt"</span><span class="p">)</span> -<span class="c"># Split data into training (60%) and test (40%)</span> +<span class="c1"># Split data into training (60%) and test (40%)</span> <span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span> <span class="n">seed</span><span class="o">=</span><span class="mi">11</span><span class="p">)</span> <span class="n">training</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span> -<span class="c"># Run training algorithm to build the model</span> +<span class="c1"># Run training algorithm to build the model</span> <span class="n">model</span> <span class="o">=</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">numClasses</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> -<span class="c"># Compute raw scores on the test set</span> +<span class="c1"># Compute raw scores on the test set</span> <span class="n">predictionAndLabels</span> <span class="o">=</span> <span class="n">test</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="p">)),</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">))</span> -<span class="c"># Instantiate metrics object</span> +<span class="c1"># Instantiate metrics object</span> <span class="n">metrics</span> <span class="o">=</span> <span class="n">MulticlassMetrics</span><span class="p">(</span><span class="n">predictionAndLabels</span><span class="p">)</span> -<span class="c"># Overall statistics</span> +<span class="c1"># Overall statistics</span> <span class="n">precision</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">()</span> <span class="n">recall</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">()</span> <span class="n">f1Score</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="p">()</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Summary Stats"</span><span class="p">)</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Precision = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">precision</span><span class="p">)</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Recall = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">recall</span><span class="p">)</span> -<span class="k">print</span><span class="p">(</span><span class="s">"F1 Score = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">f1Score</span><span class="p">)</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Summary Stats"</span><span class="p">)</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Precision = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">precision</span><span class="p">)</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Recall = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">recall</span><span class="p">)</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"F1 Score = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">f1Score</span><span class="p">)</span> -<span class="c"># Statistics by class</span> +<span class="c1"># Statistics by class</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">)</span><span class="o">.</span><span class="n">distinct</span><span class="p">()</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span> <span class="k">for</span> <span class="n">label</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">labels</span><span class="p">):</span> - <span class="k">print</span><span class="p">(</span><span class="s">"Class </span><span class="si">%s</span><span class="s"> precision = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> - <span class="k">print</span><span class="p">(</span><span class="s">"Class </span><span class="si">%s</span><span class="s"> recall = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> - <span class="k">print</span><span class="p">(</span><span class="s">"Class </span><span class="si">%s</span><span class="s"> F1 Measure = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="mf">1.0</span><span class="p">)))</span> - -<span class="c"># Weighted stats</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Weighted recall = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedRecall</span><span class="p">)</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Weighted precision = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedPrecision</span><span class="p">)</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Weighted F(1) Score = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="p">())</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Weighted F(0.5) Score = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="p">(</span><span class="n">beta</span><span class="o">=</span><span class="mf">0.5</span><span class="p">))</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Weighted false positive rate = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFalsePositiveRate</span><span class="p">)</span> + <span class="k">print</span><span class="p">(</span><span class="s2">"Class </span><span class="si">%s</span><span class="s2"> precision = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> + <span class="k">print</span><span class="p">(</span><span class="s2">"Class </span><span class="si">%s</span><span class="s2"> recall = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> + <span class="k">print</span><span class="p">(</span><span class="s2">"Class </span><span class="si">%s</span><span class="s2"> F1 Measure = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="mf">1.0</span><span class="p">)))</span> + +<span class="c1"># Weighted stats</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Weighted recall = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedRecall</span><span class="p">)</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Weighted precision = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedPrecision</span><span class="p">)</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Weighted F(1) Score = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="p">())</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Weighted F(0.5) Score = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="p">(</span><span class="n">beta</span><span class="o">=</span><span class="mf">0.5</span><span class="p">))</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Weighted false positive rate = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFalsePositiveRate</span><span class="p">)</span> </pre></div> <div><small>Find full example code at "examples/src/main/python/mllib/multi_class_metrics_example.py" in the Spark repo.</small></div> @@ -938,7 +938,7 @@ set and it exists in the true label set, for a specific data point.</p> <script type="math/tex; mode=display">D = \left\{d_0, d_1, ..., d_{N-1}\right\}</script> -<p>Define $L_0, L_1, …, L_{N-1}$ to be a family of label sets and $P_0, P_1, …, P_{N-1}$ +<p>Define $L_0, L_1, …, L<em>{N-1}$ to be a family of label sets and $P_0, P_1, …, P</em>{N-1}$ to be a family of prediction sets where $L_i$ and $P_i$ are the label set and prediction set, respectively, that correspond to document $d_i$.</p> @@ -1058,7 +1058,7 @@ use the fake prediction and label data for multilabel classification that is sho <div data-lang="scala"> <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.MultilabelMetrics"><code>MultilabelMetrics</code> Scala docs</a> for details on the API.</p> - <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.MultilabelMetrics</span> + <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.MultilabelMetrics</span> <span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span> <span class="k">val</span> <span class="n">scoreAndLabels</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span>, <span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])]</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span> @@ -1074,27 +1074,27 @@ use the fake prediction and label data for multilabel classification that is sho <span class="k">val</span> <span class="n">metrics</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">MultilabelMetrics</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">)</span> <span class="c1">// Summary stats</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Recall = ${metrics.recall}"</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Precision = ${metrics.precision}"</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"F1 measure = ${metrics.f1Measure}"</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Accuracy = ${metrics.accuracy}"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Recall = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Precision = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"F1 measure = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Accuracy = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">accuracy</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> <span class="c1">// Individual label stats</span> <span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Class $label precision = ${metrics.precision(label)}"</span><span class="o">))</span> -<span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=></span> <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Class $label recall = ${metrics.recall(label)}"</span><span class="o">))</span> -<span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=></span> <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Class $label F1-score = ${metrics.f1Measure(label)}"</span><span class="o">))</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"Class </span><span class="si">$label</span><span class="s"> precision = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="o">(</span><span class="n">label</span><span class="o">)</span><span class="si">}</span><span class="s">"</span><span class="o">))</span> +<span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=></span> <span class="n">println</span><span class="o">(</span><span class="s">s"Class </span><span class="si">$label</span><span class="s"> recall = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="o">(</span><span class="n">label</span><span class="o">)</span><span class="si">}</span><span class="s">"</span><span class="o">))</span> +<span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=></span> <span class="n">println</span><span class="o">(</span><span class="s">s"Class </span><span class="si">$label</span><span class="s"> F1-score = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="o">(</span><span class="n">label</span><span class="o">)</span><span class="si">}</span><span class="s">"</span><span class="o">))</span> <span class="c1">// Micro stats</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Micro recall = ${metrics.microRecall}"</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Micro precision = ${metrics.microPrecision}"</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Micro F1 measure = ${metrics.microF1Measure}"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Micro recall = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">microRecall</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Micro precision = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">microPrecision</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Micro F1 measure = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">microF1Measure</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> <span class="c1">// Hamming loss</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Hamming loss = ${metrics.hammingLoss}"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Hamming loss = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">hammingLoss</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> <span class="c1">// Subset accuracy</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Subset accuracy = ${metrics.subsetAccuracy}"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Subset accuracy = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">subsetAccuracy</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> </pre></div> <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala" in the Spark repo.</small></div> @@ -1103,7 +1103,7 @@ use the fake prediction and label data for multilabel classification that is sho <div data-lang="java"> <p>Refer to the <a href="api/java/org/apache/spark/mllib/evaluation/MultilabelMetrics.html"><code>MultilabelMetrics</code> Java docs</a> for details on the API.</p> - <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span> + <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span> @@ -1124,7 +1124,7 @@ use the fake prediction and label data for multilabel classification that is sho <span class="n">JavaRDD</span><span class="o"><</span><span class="n">Tuple2</span><span class="o"><</span><span class="kt">double</span><span class="o">[],</span> <span class="kt">double</span><span class="o">[]>></span> <span class="n">scoreAndLabels</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">data</span><span class="o">);</span> <span class="c1">// Instantiate metrics object</span> -<span class="n">MultilabelMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">MultilabelMetrics</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> +<span class="n">MultilabelMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MultilabelMetrics</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> <span class="c1">// Summary stats</span> <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="s">"Recall = %f\n"</span><span class="o">,</span> <span class="n">metrics</span><span class="o">.</span><span class="na">recall</span><span class="o">());</span> @@ -1160,7 +1160,7 @@ use the fake prediction and label data for multilabel classification that is sho <div data-lang="python"> <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.MultilabelMetrics"><code>MultilabelMetrics</code> Python docs</a> for more details on the API.</p> - <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">MultilabelMetrics</span> + <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">MultilabelMetrics</span> <span class="n">scoreAndLabels</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span> <span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">]),</span> @@ -1171,32 +1171,32 @@ use the fake prediction and label data for multilabel classification that is sho <span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">],</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">]),</span> <span class="p">([</span><span class="mf">1.0</span><span class="p">],</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">])])</span> -<span class="c"># Instantiate metrics object</span> +<span class="c1"># Instantiate metrics object</span> <span class="n">metrics</span> <span class="o">=</span> <span class="n">MultilabelMetrics</span><span class="p">(</span><span class="n">scoreAndLabels</span><span class="p">)</span> -<span class="c"># Summary stats</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Recall = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">())</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Precision = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">())</span> -<span class="k">print</span><span class="p">(</span><span class="s">"F1 measure = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="p">())</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Accuracy = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">accuracy</span><span class="p">)</span> +<span class="c1"># Summary stats</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Recall = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">())</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Precision = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">())</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"F1 measure = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="p">())</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Accuracy = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">accuracy</span><span class="p">)</span> -<span class="c"># Individual label stats</span> +<span class="c1"># Individual label stats</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">scoreAndLabels</span><span class="o">.</span><span class="n">flatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="o">.</span><span class="n">distinct</span><span class="p">()</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span> <span class="k">for</span> <span class="n">label</span> <span class="ow">in</span> <span class="n">labels</span><span class="p">:</span> - <span class="k">print</span><span class="p">(</span><span class="s">"Class </span><span class="si">%s</span><span class="s"> precision = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> - <span class="k">print</span><span class="p">(</span><span class="s">"Class </span><span class="si">%s</span><span class="s"> recall = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> - <span class="k">print</span><span class="p">(</span><span class="s">"Class </span><span class="si">%s</span><span class="s"> F1 Measure = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> + <span class="k">print</span><span class="p">(</span><span class="s2">"Class </span><span class="si">%s</span><span class="s2"> precision = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> + <span class="k">print</span><span class="p">(</span><span class="s2">"Class </span><span class="si">%s</span><span class="s2"> recall = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> + <span class="k">print</span><span class="p">(</span><span class="s2">"Class </span><span class="si">%s</span><span class="s2"> F1 Measure = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span> -<span class="c"># Micro stats</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Micro precision = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microPrecision</span><span class="p">)</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Micro recall = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microRecall</span><span class="p">)</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Micro F1 measure = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microF1Measure</span><span class="p">)</span> +<span class="c1"># Micro stats</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Micro precision = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microPrecision</span><span class="p">)</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Micro recall = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microRecall</span><span class="p">)</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Micro F1 measure = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microF1Measure</span><span class="p">)</span> -<span class="c"># Hamming loss</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Hamming loss = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">hammingLoss</span><span class="p">)</span> +<span class="c1"># Hamming loss</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Hamming loss = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">hammingLoss</span><span class="p">)</span> -<span class="c"># Subset accuracy</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Subset accuracy = </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">subsetAccuracy</span><span class="p">)</span> +<span class="c1"># Subset accuracy</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Subset accuracy = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">subsetAccuracy</span><span class="p">)</span> </pre></div> <div><small>Find full example code at "examples/src/main/python/mllib/multi_label_metrics_example.py" in the Spark repo.</small></div> @@ -1317,7 +1317,7 @@ expanded world of non-positive weights are “the same as never having inter <div data-lang="scala"> <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.RegressionMetrics"><code>RegressionMetrics</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.RankingMetrics"><code>RankingMetrics</code> Scala docs</a> for details on the API.</p> - <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.</span><span class="o">{</span><span class="nc">RankingMetrics</span><span class="o">,</span> <span class="nc">RegressionMetrics</span><span class="o">}</span> + <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.</span><span class="o">{</span><span class="nc">RankingMetrics</span><span class="o">,</span> <span class="nc">RegressionMetrics</span><span class="o">}</span> <span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.</span><span class="o">{</span><span class="nc">ALS</span><span class="o">,</span> <span class="nc">Rating</span><span class="o">}</span> <span class="c1">// Read in the ratings data</span> @@ -1334,7 +1334,7 @@ expanded world of non-positive weights are “the same as never having inter <span class="k">val</span> <span class="n">numRatings</span> <span class="k">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">count</span><span class="o">()</span> <span class="k">val</span> <span class="n">numUsers</span> <span class="k">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">user</span><span class="o">).</span><span class="n">distinct</span><span class="o">().</span><span class="n">count</span><span class="o">()</span> <span class="k">val</span> <span class="n">numMovies</span> <span class="k">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">product</span><span class="o">).</span><span class="n">distinct</span><span class="o">().</span><span class="n">count</span><span class="o">()</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Got $numRatings ratings from $numUsers users on $numMovies movies."</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Got </span><span class="si">$numRatings</span><span class="s"> ratings from </span><span class="si">$numUsers</span><span class="s"> users on </span><span class="si">$numMovies</span><span class="s"> movies."</span><span class="o">)</span> <span class="c1">// Build the model</span> <span class="k">val</span> <span class="n">numIterations</span> <span class="k">=</span> <span class="mi">10</span> @@ -1366,15 +1366,15 @@ expanded world of non-positive weights are “the same as never having inter <span class="c1">// Precision at K</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">5</span><span class="o">).</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">k</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Precision at $k = ${metrics.precisionAt(k)}"</span><span class="o">)</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"Precision at </span><span class="si">$k</span><span class="s"> = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">precisionAt</span><span class="o">(</span><span class="n">k</span><span class="o">)</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> <span class="o">}</span> <span class="c1">// Mean average precision</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Mean average precision = ${metrics.meanAveragePrecision}"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"Mean average precision = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">meanAveragePrecision</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> <span class="c1">// Normalized discounted cumulative gain</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">5</span><span class="o">).</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">k</span> <span class="k">=></span> - <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"NDCG at $k = ${metrics.ndcgAt(k)}"</span><span class="o">)</span> + <span class="n">println</span><span class="o">(</span><span class="s">s"NDCG at </span><span class="si">$k</span><span class="s"> = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">ndcgAt</span><span class="o">(</span><span class="n">k</span><span class="o">)</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> <span class="o">}</span> <span class="c1">// Get predictions for each data point</span> @@ -1388,10 +1388,10 @@ expanded world of non-positive weights are “the same as never having inter <span class="c1">// Get the RMSE using regression metrics</span> <span class="k">val</span> <span class="n">regressionMetrics</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">RegressionMetrics</span><span class="o">(</span><span class="n">predictionsAndLabels</span><span class="o">)</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"RMSE = ${regressionMetrics.rootMeanSquaredError}"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"RMSE = </span><span class="si">${</span><span class="n">regressionMetrics</span><span class="o">.</span><span class="n">rootMeanSquaredError</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> <span class="c1">// R-squared</span> -<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">"R-squared = ${regressionMetrics.r2}"</span><span class="o">)</span> +<span class="n">println</span><span class="o">(</span><span class="s">s"R-squared = </span><span class="si">${</span><span class="n">regressionMetrics</span><span class="o">.</span><span class="n">r2</span><span class="si">}</span><span class="s">"</span><span class="o">)</span> </pre></div> <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/RankingMetricsExample.scala" in the Spark repo.</small></div> @@ -1400,7 +1400,7 @@ expanded world of non-positive weights are “the same as never having inter <div data-lang="java"> <p>Refer to the <a href="api/java/org/apache/spark/mllib/evaluation/RegressionMetrics.html"><code>RegressionMetrics</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/evaluation/RankingMetrics.html"><code>RankingMetrics</code> Java docs</a> for details on the API.</p> - <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.*</span><span class="o">;</span> + <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.*</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span> @@ -1419,7 +1419,7 @@ expanded world of non-positive weights are “the same as never having inter <span class="nd">@Override</span> <span class="kd">public</span> <span class="n">Rating</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">line</span><span class="o">)</span> <span class="o">{</span> <span class="n">String</span><span class="o">[]</span> <span class="n">parts</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">"::"</span><span class="o">);</span> - <span class="k">return</span> <span class="k">new</span> <span class="nf">Rating</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">1</span><span class="o">]),</span> <span class="n">Double</span> + <span class="k">return</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">1</span><span class="o">]),</span> <span class="n">Double</span> <span class="o">.</span><span class="na">parseDouble</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">2</span><span class="o">])</span> <span class="o">-</span> <span class="mf">2.5</span><span class="o">);</span> <span class="o">}</span> <span class="o">}</span> @@ -1438,7 +1438,7 @@ expanded world of non-positive weights are “the same as never having inter <span class="n">Rating</span><span class="o">[]</span> <span class="n">scaledRatings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">[</span><span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">().</span><span class="na">length</span><span class="o">];</span> <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">scaledRatings</span><span class="o">.</span><span class="na">length</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span> <span class="kt">double</span> <span class="n">newRating</span> <span class="o">=</span> <span class="n">Math</span><span class="o">.</span><span class="na">max</span><span class="o">(</span><span class="n">Math</span><span class="o">.</span><span class="na">min</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">rating</span><span class="o">(),</span> <span class="mf">1.0</span><span class="o">),</span> <span class="mf">0.0</span><span class="o">);</span> - <span class="n">scaledRatings</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Rating</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">user</span><span class="o">(),</span> <span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">product</span><span class="o">(),</span> <span class="n">newRating</span><span class="o">);</span> + <span class="n">scaledRatings</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">user</span><span class="o">(),</span> <span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">product</span><span class="o">(),</span> <span class="n">newRating</span><span class="o">);</span> <span class="o">}</span> <span class="k">return</span> <span class="k">new</span> <span class="n">Tuple2</span><span class="o"><>(</span><span class="n">t</span><span class="o">.</span><span class="na">_1</span><span class="o">(),</span> <span class="n">scaledRatings</span><span class="o">);</span> <span class="o">}</span> @@ -1457,7 +1457,7 @@ expanded world of non-positive weights are “the same as never having inter <span class="o">}</span> <span class="k">else</span> <span class="o">{</span> <span class="n">binaryRating</span> <span class="o">=</span> <span class="mf">0.0</span><span class="o">;</span> <span class="o">}</span> - <span class="k">return</span> <span class="k">new</span> <span class="nf">Rating</span><span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">user</span><span class="o">(),</span> <span class="n">r</span><span class="o">.</span><span class="na">product</span><span class="o">(),</span> <span class="n">binaryRating</span><span class="o">);</span> + <span class="k">return</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">user</span><span class="o">(),</span> <span class="n">r</span><span class="o">.</span><span class="na">product</span><span class="o">(),</span> <span class="n">binaryRating</span><span class="o">);</span> <span class="o">}</span> <span class="o">}</span> <span class="o">);</span> @@ -1548,7 +1548,7 @@ expanded world of non-positive weights are “the same as never having inter <span class="o">)).</span><span class="na">join</span><span class="o">(</span><span class="n">predictions</span><span class="o">).</span><span class="na">values</span><span class="o">();</span> <span class="c1">// Create regression metrics object</span> -<span class="n">RegressionMetrics</span> <span class="n">regressionMetrics</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">RegressionMetrics</span><span class="o">(</span><span class="n">ratesAndPreds</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> +<span class="n">RegressionMetrics</span> <span class="n">regressionMetrics</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RegressionMetrics</span><span class="o">(</span><span class="n">ratesAndPreds</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span> <span class="c1">// Root mean squared error</span> <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="s">"RMSE = %f\n"</span><span class="o">,</span> <span class="n">regressionMetrics</span><span class="o">.</span><span class="na">rootMeanSquaredError</span><span class="o">());</span> @@ -1563,35 +1563,35 @@ expanded world of non-positive weights are “the same as never having inter <div data-lang="python"> <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RegressionMetrics"><code>RegressionMetrics</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RankingMetrics"><code>RankingMetrics</code> Python docs</a> for more details on the API.</p> - <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.recommendation</span> <span class="kn">import</span> <span class="n">ALS</span><span class="p">,</span> <span class="n">Rating</span> + <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.recommendation</span> <span class="kn">import</span> <span class="n">ALS</span><span class="p">,</span> <span class="n">Rating</span> <span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">RegressionMetrics</span><span class="p">,</span> <span class="n">RankingMetrics</span> -<span class="c"># Read in the ratings data</span> -<span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">"data/mllib/sample_movielens_data.txt"</span><span class="p">)</span> +<span class="c1"># Read in the ratings data</span> +<span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">"data/mllib/sample_movielens_data.txt"</span><span class="p">)</span> <span class="k">def</span> <span class="nf">parseLine</span><span class="p">(</span><span class="n">line</span><span class="p">):</span> - <span class="n">fields</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">"::"</span><span class="p">)</span> + <span class="n">fields</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"::"</span><span class="p">)</span> <span class="k">return</span> <span class="n">Rating</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">fields</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="nb">int</span><span class="p">(</span><span class="n">fields</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="nb">float</span><span class="p">(</span><span class="n">fields</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span> <span class="o">-</span> <span class="mf">2.5</span><span class="p">)</span> <span class="n">ratings</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="n">parseLine</span><span class="p">(</span><span class="n">r</span><span class="p">))</span> -<span class="c"># Train a model on to predict user-product ratings</span> +<span class="c1"># Train a model on to predict user-product ratings</span> <span class="n">model</span> <span class="o">=</span> <span class="n">ALS</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">ratings</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">)</span> -<span class="c"># Get predicted ratings on all existing user-product pairs</span> +<span class="c1"># Get predicted ratings on all existing user-product pairs</span> <span class="n">testData</span> <span class="o">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="n">p</span><span class="o">.</span><span class="n">product</span><span class="p">))</span> <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predictAll</span><span class="p">(</span><span class="n">testData</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="p">((</span><span class="n">r</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">product</span><span class="p">),</span> <span class="n">r</span><span class="o">.</span><span class="n">rating</span><span class="p">))</span> <span class="n">ratingsTuple</span>
<TRUNCATED> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org