[ https://issues.apache.org/jira/browse/SPARK-29826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen resolved SPARK-29826. ---------------------------------- Resolution: Duplicate > Missing persist on data in mllib.feature.ChiSqSelector.fit > ---------------------------------------------------------- > > Key: SPARK-29826 > URL: https://issues.apache.org/jira/browse/SPARK-29826 > Project: Spark > Issue Type: Sub-task > Components: MLlib > Affects Versions: 2.4.3 > Reporter: Dong Wang > Priority: Major > > The rdd data in mllib.feature.ChiSqSelector.fit() is used by an action in > Statistics.chiSqTest(data) and other actions in the following code, but it is > not persisted. > {code:scala} > def fit(data: RDD[LabeledPoint]): ChiSqSelectorModel = { > val chiSqTestResult = Statistics.chiSqTest(data).zipWithIndex > val features = selectorType match { > case ChiSqSelector.NumTopFeatures => > chiSqTestResult > .sortBy { case (res, _) => res.pValue } > .take(numTopFeatures) > {code} > This issue is reported by our tool CacheCheck, which is used to dynamically > detecting persist()/unpersist() api misuses. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org