[ https://issues.apache.org/jira/browse/SPARK-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255935#comment-15255935 ]
praveen dareddy commented on SPARK-14759: ----------------------------------------- Hi Tomasz, I am new to Spark but would like to help on this issue. I have the environment set up in my local system. I have quite recently started going through the code base and am eager to contribute. Can you point me towards the specific module i need to understand to solve this issue? Thanks, Red > After join one cannot drop dynamically added column > --------------------------------------------------- > > Key: SPARK-14759 > URL: https://issues.apache.org/jira/browse/SPARK-14759 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.6.1 > Reporter: Tomasz Bartczak > Priority: Minor > > running following code: > {code} > from pyspark.sql.functions import * > df1 = sqlContext.createDataFrame([(1,10,)], ['any','hour']) > df2 = sqlContext.createDataFrame([(1,)], ['any']).withColumn('hour',lit(10)) > j = df1.join(df2,[df1.hour == df2.hour],how='left') > print("columns after join:{0}".format(j.columns)) > jj = j.drop(df2.hour) > print("columns after removing 'hour':{0}".format(jj.columns)) > {code} > should show that after join and remove df2.hour I end up with only one 'hour' > column in dataframe. > Unfortunately this column is not dropped. > {code} > columns after join: ['any', 'hour', 'any', 'hour'] > columns after removing 'hour': ['any', 'hour', 'any', 'hour'] > {code} > I found out that it behaves like that only when the column is added > dynamically before the join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org