Hi, When I cache the dataframe and run the query, val df = sqlContext.sql("select name,age from TBL_STUDENT where age = 37") df.cache() df.show println(df.queryExecution)
I got the following execution plan,from the optimized logical plan,I can see the whole analyzed logical plan is totally replaced with the InMemoryRelation logical plan. But when I look into the Optimizer, I didn't see any optimizer that relates to the InMemoryRelation. Could you please explain how the optimization works? == Parsed Logical Plan == '<Project>, argString:< [unresolvedalias(UnresolvedAttribute: 'name),unresolvedalias(UnresolvedAttribute: 'age)]> '<Filter>, argString:< (UnresolvedAttribute: 'age = 37)> '<UnresolvedRelation>, argString:< [TBL_STUDENT], None> == Analyzed Logical Plan == name: string, age: int <Project>, argString:< [AttributeReference:name#1,AttributeReference:age#3]> <Filter>, argString:< (AttributeReference:age#3 = 37)> <Subquery>, argString:< TBL_STUDENT> <LogicalRDD>, argString:< [AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3], MapPartitionsRDD[4] at main at NativeMethodAccessorImpl.java:-2> == Optimized Logical Plan == <InMemoryRelation>, argString:< [AttributeReference:name#1,AttributeReference:age#3], true, 10000, StorageLevel(true, true, false, true, 1), (<TungstenProject>, argString:< [AttributeReference:name#1,AttributeReference:age#3]>), None> == Physical Plan == <InMemoryColumnarTableScan>, argString:< [AttributeReference:name#1,AttributeReference:age#3], (<InMemoryRelation>, argString:< [AttributeReference:name#1,AttributeReference:age#3], true, 10000, StorageLevel(true, true, false, true, 1), (<TungstenProject>, argString:< [AttributeReference:name#1,AttributeReference:age#3]>), None>)>