Dong Wang created SPARK-29832:
---------------------------------

             Summary: Unnecessary persist on instances in 
ml.regression.IsotonicRegression.fit
                 Key: SPARK-29832
                 URL: https://issues.apache.org/jira/browse/SPARK-29832
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 3.0.0
            Reporter: Dong Wang


Persist on instances in ml.regression.IsotonicRegression.fit() is unnecessary, 
because it is only used once in run(instances).
{code:scala}
  override def fit(dataset: Dataset[_]): IsotonicRegressionModel = instrumented 
{ instr =>
    transformSchema(dataset.schema, logging = true)
    // Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
    val instances = extractWeightedLabeledPoints(dataset)
    val handlePersistence = dataset.storageLevel == StorageLevel.NONE
    // Unnecessary persist
    if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
    instr.logPipelineStage(this)
    instr.logDataset(dataset)
    instr.logParams(this, labelCol, featuresCol, weightCol, predictionCol, 
featureIndex, isotonic)
    instr.logNumFeatures(1)
    val isotonicRegression = new 
MLlibIsotonicRegression().setIsotonic($(isotonic))
    val oldModel = isotonicRegression.run(instances) // Only use once here
    if (handlePersistence) instances.unpersist()
{code}

This issue is reported by our tool CacheCheck, which is used to dynamically 
detecting persist()/unpersist() api misuses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to