[ https://issues.apache.org/jira/browse/SPARK-15915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-15915: ------------------------------------- Assignee: Takuya Ueshin > CacheManager should use canonicalized plan for planToCache. > ----------------------------------------------------------- > > Key: SPARK-15915 > URL: https://issues.apache.org/jira/browse/SPARK-15915 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Takuya Ueshin > Assignee: Takuya Ueshin > Fix For: 2.0.0 > > > {{DataFrame}} with plan overriding {{sameResult}} but not using canonicalized > plan to compare can't cacheTable. > The example is like: > {code} > val localRelation = Seq(1, 2, 3).toDF() > localRelation.createOrReplaceTempView("localRelation") > spark.catalog.cacheTable("localRelation") > assert( > localRelation.queryExecution.withCachedData.collect { > case i: InMemoryRelation => i > }.size == 1) > {code} > and this will fail as: > {noformat} > ArrayBuffer() had size 0 instead of expected size 1 > {noformat} > The reason is that when do {{spark.catalog.cacheTable("localRelation")}}, > {{CacheManager}} tries to cache for the plan wrapped by {{SubqueryAlias}} but > when planning for the DataFrame {{localRelation}}, {{CacheManager}} tries to > find cached table for the not-wrapped plan because the plan for DataFrame > {{localRelation}} is not wrapped. > Some plans like {{LocalRelation}}, {{LogicalRDD}}, etc. override > {{sameResult}} method, but not use canonicalized plan to compare so the > {{CacheManager}} can't detect the plans are the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org