[ https://issues.apache.org/jira/browse/SPARK-25377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-25377. ---------------------------------- Resolution: Incomplete > spark sql dataframe cache is invalid > ------------------------------------ > > Key: SPARK-25377 > URL: https://issues.apache.org/jira/browse/SPARK-25377 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.0 > Environment: spark version 2.3.0 > scala version 2.1.8 > Reporter: Iverson Hu > Priority: Major > Labels: bulk-closed > > When I use SQL dataframe in application, I found that dataframe.cache is > invalid, the first time to execute Action like count() took me 40 seconds, > and the seconds time to execute Action also.So I use dataframe.rdd.cache, > second execution time is less than first execution time. And I think it's SQL > dataframe's bug. > This is my codes and console log, and I have cached the datafame of result > before. > this is my codes > logger.info("start to consuming result count") > logger.info(s"consuming ${result.count} output records") > //result.show(false) > logger.info("starting go to MysqlSink") > logger.info(s"consuming ${result.count} output records") > logger.info("starting go to MysqlSink") > > And console log is below > 18/09/08 14:15:17 INFO MySQLRiskScenarioRunner: start to consuming result > count > 18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: consuming 5 output records > 18/09/08 14:15:49 INFO MySQLRiskScenarioRunner: starting go to MysqlSink > 18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: consuming 5 output records > 18/09/08 14:16:22 INFO MySQLRiskScenarioRunner: starting go to MysqlSink > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org