Hi,

When I cache the dataframe and run the query,
  
    val df = sqlContext.sql("select  name,age from TBL_STUDENT where age = 37")
    df.cache()
    df.show
    println(df.queryExecution)

 I got the following execution plan,from the optimized logical plan,I can see 
the whole analyzed logical plan is totally replaced with the InMemoryRelation 
logical plan. But when I look into the Optimizer, I didn't see any optimizer 
that relates to the InMemoryRelation.

Could you please explain how the optimization works?




== Parsed Logical Plan ==
'<Project>, argString:< [unresolvedalias(UnresolvedAttribute: 
'name),unresolvedalias(UnresolvedAttribute: 'age)]>
  '<Filter>, argString:< (UnresolvedAttribute: 'age = 37)>
    '<UnresolvedRelation>, argString:< [TBL_STUDENT], None>

== Analyzed Logical Plan ==
name: string, age: int
<Project>, argString:< [AttributeReference:name#1,AttributeReference:age#3]>
  <Filter>, argString:< (AttributeReference:age#3 = 37)>
    <Subquery>, argString:< TBL_STUDENT>
      <LogicalRDD>, argString:< 
[AttributeReference:id#0,AttributeReference:name#1,AttributeReference:classId#2,AttributeReference:age#3],
 MapPartitionsRDD[4] at main at NativeMethodAccessorImpl.java:-2>

== Optimized Logical Plan ==
<InMemoryRelation>, argString:< 
[AttributeReference:name#1,AttributeReference:age#3], true, 10000, 
StorageLevel(true, true, false, true, 1), (<TungstenProject>, argString:< 
[AttributeReference:name#1,AttributeReference:age#3]>), None>

== Physical Plan ==
<InMemoryColumnarTableScan>, argString:< 
[AttributeReference:name#1,AttributeReference:age#3], (<InMemoryRelation>, 
argString:< [AttributeReference:name#1,AttributeReference:age#3], true, 10000, 
StorageLevel(true, true, false, true, 1), (<TungstenProject>, argString:< 
[AttributeReference:name#1,AttributeReference:age#3]>), None>)>










Reply via email to