Dong Wang created SPARK-29832: --------------------------------- Summary: Unnecessary persist on instances in ml.regression.IsotonicRegression.fit Key: SPARK-29832 URL: https://issues.apache.org/jira/browse/SPARK-29832 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.0.0 Reporter: Dong Wang
Persist on instances in ml.regression.IsotonicRegression.fit() is unnecessary, because it is only used once in run(instances). {code:scala} override def fit(dataset: Dataset[_]): IsotonicRegressionModel = instrumented { instr => transformSchema(dataset.schema, logging = true) // Extract columns from data. If dataset is persisted, do not persist oldDataset. val instances = extractWeightedLabeledPoints(dataset) val handlePersistence = dataset.storageLevel == StorageLevel.NONE // Unnecessary persist if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK) instr.logPipelineStage(this) instr.logDataset(dataset) instr.logParams(this, labelCol, featuresCol, weightCol, predictionCol, featureIndex, isotonic) instr.logNumFeatures(1) val isotonicRegression = new MLlibIsotonicRegression().setIsotonic($(isotonic)) val oldModel = isotonicRegression.run(instances) // Only use once here if (handlePersistence) instances.unpersist() {code} This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org