Dear Spark users, I ran the Python code below on a simple RDD, but it gave strange results. The filtered RDD contains non-existent elements which were filtered away earlier. Any idea why this happened? ``` rdd = spark.sparkContext.parallelize([0,1,2]) for i in range(3): print("RDD is ", rdd.collect()) print("Filtered RDD is ", rdd.filter(lambda x:x!=i).collect()) rdd = rdd.filter(lambda x:x!=i) print("Result is ", rdd.collect()) print() ``` which gave ``` RDD is [0, 1, 2] Filtered RDD is [1, 2] Result is [1, 2]
RDD is [1, 2] Filtered RDD is [0, 2] Result is [0, 2] RDD is [0, 2] Filtered RDD is [0, 1] Result is [0, 1] ``` Thanks, Marco