HBase scans come with the ability to specify filters that make scans very fast and efficient (as they let you seek for the keys that pass the filter).
Do RDD's or Spark DataFrames offer anything similar or would I be required to use a NoSQL db like HBase to do something like this? -- Stuart Layton
