HBase scans come with the ability to specify filters that make scans very
fast and efficient (as they let you seek for the keys that pass the filter).
Do RDD's or Spark DataFrames offer anything similar or would I be required
to use a NoSQL db like HBase to do something like this?
--
Stuart
= TableMapReduceUtil.convertStringToScan(conf.get(SCAN));
You can use TableMapReduceUtil#convertScanToString() to convert a Scan
which has filter(s) and pass to TableInputFormat
Cheers
On Thu, Mar 26, 2015 at 6:46 AM, Stuart Layton stuart.lay...@gmail.com
wrote:
HBase scans come with the ability to specify filters that make
and saving it to
S3, however as I want to optimize for filtering speed I'm not sure this is
the best option.
--
Stuart Layton
should certainly use them for
the advanced stuff that expressions can't handle).
I opened SPARK-6536 https://issues.apache.org/jira/browse/SPARK-6536 to
provide a nicer interface for this.
On Wed, Mar 25, 2015 at 7:41 AM, Stuart Layton stuart.lay...@gmail.com
wrote:
I have a SparkSQL
-testing/, expected: hdfs://
ec2-52-0-159-113.compute-1.amazonaws.com:9000
Is it possible to save a dataframe to s3 directly using parquet?
--
Stuart Layton