Re: RDD equivalent of HBase Scan

Stuart Layton Thu, 26 Mar 2015 06:59:11 -0700

Thanks but I'm hoping to get away from hbase all together. I was wondering
if there is a way to get similar scan performance directly on cached rdd's
or data frames


On Thu, Mar 26, 2015 at 9:54 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> In examples//src/main/scala/org/apache/spark/examples/HBaseTest.scala,
> TableInputFormat is used.
> TableInputFormat accepts parameter
>
>   public static final String SCAN = "hbase.mapreduce.scan";
>
> where if specified, Scan object would be created from String form:
>
>     if (conf.get(SCAN) != null) {
>
>       try {
>
>         scan = TableMapReduceUtil.convertStringToScan(conf.get(SCAN));
>
> You can use TableMapReduceUtil#convertScanToString() to convert a Scan
> which has filter(s) and pass to TableInputFormat
>
> Cheers
>
> On Thu, Mar 26, 2015 at 6:46 AM, Stuart Layton <stuart.lay...@gmail.com>
> wrote:
>
>> HBase scans come with the ability to specify filters that make scans very
>> fast and efficient (as they let you seek for the keys that pass the filter).
>>
>> Do RDD's or Spark DataFrames offer anything similar or would I be
>> required to use a NoSQL db like HBase to do something like this?
>>
>> --
>> Stuart Layton
>>
>
>


-- 
Stuart Layton

Re: RDD equivalent of HBase Scan

Reply via email to