[ https://issues.apache.org/jira/browse/SPARK-24835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-24835. ---------------------------------- Resolution: Incomplete > col function ignores drop > ------------------------- > > Key: SPARK-24835 > URL: https://issues.apache.org/jira/browse/SPARK-24835 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.0 > Environment: Spark 2.3.0 > Python 3.5.3 > Reporter: Michael Souder > Priority: Minor > Labels: bulk-closed > > Not sure if this is a bug or user error, but I've noticed that accessing > columns with the col function ignores a previous call to drop. > {code} > import pyspark.sql.functions as F > df = spark.createDataFrame([(1,3,5), (2, None, 7), (0, 3, 2)], ['a', 'b', > 'c']) > df.show() > +---+----+---+ > | a| b| c| > +---+----+---+ > | 1| 3| 5| > | 2|null| 7| > | 0| 3| 2| > +---+----+---+ > df = df.drop('c') > # the col function is able to see the 'c' column even though it has been > dropped > df.where(F.col('c') < 6).show() > +---+---+ > | a| b| > +---+---+ > | 1| 3| > | 0| 3| > +---+---+ > # trying the same with brackets on the data frame fails with the expected > error > df.where(df['c'] < 6).show() > Py4JJavaError: An error occurred while calling o36909.apply. > : org.apache.spark.sql.AnalysisException: Cannot resolve column name "c" > among (a, b);{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org