Each operation on a dataframe is completely independent and doesn't know what operations happened before it. When you do a selection, you are removing other columns from the dataframe and so the filter has nothing to operate on.
On Fri, Jul 17, 2015 at 11:55 AM, Mike Trienis <mike.trie...@orcsol.com> wrote: > I'd like to understand why the where field must exist in the select > clause. > > For example, the following select statement works fine > > - df.select("field1", "filter_field").filter(df("filter_field") === > "value").show() > > However, the next one fails with the error "in operator !Filter > (filter_field#60 = value);" > > - df.select("field1").filter(df("filter_field") === "value").show() > > As a work-around, it seems that I can do the following > > - df.select("field1", "filter_field").filter(df("filter_field") === > "value").drop("filter_field").show() > > > Thanks, Mike. >