Hi all, I am totally new to ML APIs. Trying to get the *ROC_Curve* for Model Evaluation on both *ScikitLearn* and *PySpark MLLib*. I do not find any API for ROC_Curve calculation for BinaryClassification in SparkMLLib.
The codes below have a wrapper function which is creating the respective dataframe from the source data with two columns which is as attached. I want to achieve the same result as Python code in the Spark to get the roc_curve. Is there any API from MLLib side to achieve the same? Python sklearn Code - def roc(self, y_true, y_pred): df_a = self._df.copy() values_1_tmp = df_a[y_true].values values_1_tmp2 = values_1_tmp[~np.isnan(values_1_tmp)] values_1 = values_1_tmp2.astype(int) values_2_tmp = df_a[y_pred].values values_2_tmp2 = values_2_tmp[~np.isnan(values_2_tmp)] values_2 = values_2_tmp2.astype(int) specificity, sensitivity, thresholds = metrics.roc_curve(values_1, values_2, pos_label=2) # area_under_roc = metrics.roc_auc_score(values_1, values_2) print(sensitivity, specificity) return sensitivity, specificity Result: [ 0. 0.34138342 0.67412045 1. ] [ 0. 0.33373458 0.67378875 1. ] PySpark Code - def roc(self, y_true, y_pred): print('using pyspark df') df_a = self._df values_1 = list(df_a[y_true, y_pred].toPandas().values) new_list = [l.tolist() for l in values_1] double_list = [] for myList in new_list: temp = [] for item in myList: temp.append(float(item)) double_list.append(temp) new_rdd = self._sc.parallelize(double_list) metrics = BinaryClassificationMetrics(new_rdd) roc_calc = metrics.areaUnderROC print(roc_calc) print(type(roc_calc)) return 1 Please help. Thanks, Aakash.