Makes sense. Thanks Michael (and welcome back from #SparkSummit!) On to
exploring the space...
Jacek
On 9 Jun 2016 6:10 p.m., "Michael Armbrust" wrote:
> Look at the explain(). For a Seq we know its just local data so avoid
> spark jobs for simple operations. In contrast, an RDD is opaque to
>
Look at the explain(). For a Seq we know its just local data so avoid
spark jobs for simple operations. In contrast, an RDD is opaque to
catalyst so we can't perform that optimization.
On Wed, Jun 8, 2016 at 7:49 AM, Jacek Laskowski wrote:
> Hi,
>
> I just noticed it today while toying with Sp
Hi,
I just noticed it today while toying with Spark 2.0.0 (today's build)
that doing Seq(...).toDF does **not** submit a Spark job while
sc.parallelize(Seq(...)).toDF does. I was nicely surprised and been
thinking about the reason for the behaviour.
My explanation was that Datasets are just a "vi