I have an RDD x of millions of STRINGs, each of which I want to pass
through a set of filters. My filtering code looks like this:
x.filter(filter#1, which will filter out 40% of data).
filter(filter#2, which will filter out 20% of data).
filter(filter#3, which will filter out 2% of data).
In the situation you show, Spark will pipeline each filter together, and
will apply each filter one at a time to each row, effectively constructing
an statement. You would only see a performance difference if the
filter code itself is somewhat expensive, then you would want to only
execute it on