Ok, I see, thanks for the correction, but this should be optimized.
From: Shixiong Zhu [mailto:zsxw...@gmail.com]
Sent: Tuesday, August 25, 2015 2:08 PM
To: Cheng, Hao
Cc: Jeff Zhang; user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
That's two jobs. `SparkPlan.execut
gt; 2 jobs, not 2 tasks.
>
>
>
> *From:* Shixiong Zhu [mailto:zsxw...@gmail.com]
> *Sent:* Tuesday, August 25, 2015 1:29 PM
> *To:* Cheng, Hao
> *Cc:* Jeff Zhang; user@spark.apache.org
>
> *Subject:* Re: DataFrame#show cost 2 Spark Jobs ?
>
>
>
> Hao,
>
>
ay, August 25, 2015 8:11 AM
To: Cheng, Hao
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
Hi Cheng,
I know that sqlContext.read will trigger one spark job to infer the schema.
What I mean is DataFrame#show cost 2 spark jobs. So overa
8:11 AM
> *To:* Cheng, Hao
> *Cc:* user@spark.apache.org
> *Subject:* Re: DataFrame#show cost 2 Spark Jobs ?
>
>
>
> Hi Cheng,
>
>
>
> I know that sqlContext.read will trigger one spark job to infer the
> schema. What I mean is DataFrame#show cost 2 spark
loading the data for JSON, it’s probably causes longer time for ramp up with
large number of files/partitions.
From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Tuesday, August 25, 2015 8:11 AM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
Hi Cheng,
I
main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L185
Best Regards,
Shixiong Zhu
2015-08-25 8:11 GMT+08:00 Jeff Zhang :
> Hi Cheng,
>
> I know that sqlContext.read will trigger one spark job to infer the
> schema. What I mean is DataFrame#show cost 2 spark jobs. So overall
Hi Cheng,
I know that sqlContext.read will trigger one spark job to infer the schema.
What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost 3
jobs.
Here's the command I use:
>> val df =
sqlContext.read.json("file:///Users/hadoop/github/spark/examples/sr
@spark.apache.org
Subject: DataFrame#show cost 2 Spark Jobs ?
It's weird to me that the simple show function will cost 2 spark jobs.
DataFrame#explain shows it is a very simple operation, not sure why need 2 jobs.
== Parsed Logical Plan ==
Relation[age#0L,name#1]
JSONRelation[file:/Users/h
It's weird to me that the simple show function will cost 2 spark jobs.
DataFrame#explain shows it is a very simple operation, not sure why need 2
jobs.
== Parsed Logical Plan ==
Relation[age#0L,name#1]
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json]
== Analyz