Re: RDD filter in for loop gave strange results

2021-01-20 Thread Marco Wong
Hmm, I think I got what Jingnan means. The lambda function is x != i and i is not evaluated when the lambda function was defined. So the pipelined rdd is rdd.filter(lambda x: x != i).filter(lambda x: x != i), rather than having the values of i substituted. Does that make sense to you, Sean? On

RDD filter in for loop gave strange results

2021-01-20 Thread Marco Wong
Dear Spark users, I ran the Python code below on a simple RDD, but it gave strange results. The filtered RDD contains non-existent elements which were filtered away earlier. Any idea why this happened? ``` rdd = spark.sparkContext.parallelize([0,1,2]) for i in range(3): print("RDD is ",