ow()
>
>
>
> Mohammed
>
>
>
> *From:* Harish Butani [mailto:rhbutani.sp...@gmail.com]
> *Sent:* Monday, July 20, 2015 5:37 PM
> *To:* Mohammed Guller
> *Cc:* Michael Armbrust; Mike Trienis; user@spark.apache.org
>
> *Subject:* Re: Data frames select and where clause
Michael Armbrust; Mike Trienis; user@spark.apache.org
Subject: Re: Data frames select and where clause dependency
Yes via: org.apache.spark.sql.catalyst.optimizer.ColumnPruning
See DefaultOptimizer.batches for list of logical rewrites.
You can see the optimized plan by printing: df.queryExecution
ing other
> columns from df are not used anywhere else)?
>
>
>
> Mohammed
>
>
>
> *From:* Michael Armbrust [mailto:mich...@databricks.com]
> *Sent:* Friday, July 17, 2015 1:39 PM
> *To:* Mike Trienis
> *Cc:* user@spark.apache.org
> *Subject:* Re: Data frames select an
other columns from df
are not used anywhere else)?
Mohammed
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Friday, July 17, 2015 1:39 PM
To: Mike Trienis
Cc: user@spark.apache.org
Subject: Re: Data frames select and where clause dependency
Each operation on a dataframe is completel
Each operation on a dataframe is completely independent and doesn't know
what operations happened before it. When you do a selection, you are
removing other columns from the dataframe and so the filter has nothing to
operate on.
On Fri, Jul 17, 2015 at 11:55 AM, Mike Trienis
wrote:
> I'd like t
I'd like to understand why the where field must exist in the select clause.
For example, the following select statement works fine
- df.select("field1", "filter_field").filter(df("filter_field") ===
"value").show()
However, the next one fails with the error "in operator !Filter
(filter_fie