Thanks but I'm hoping to get away from hbase all together. I was wondering if there is a way to get similar scan performance directly on cached rdd's or data frames
On Thu, Mar 26, 2015 at 9:54 AM, Ted Yu <yuzhih...@gmail.com> wrote: > In examples//src/main/scala/org/apache/spark/examples/HBaseTest.scala, > TableInputFormat is used. > TableInputFormat accepts parameter > > public static final String SCAN = "hbase.mapreduce.scan"; > > where if specified, Scan object would be created from String form: > > if (conf.get(SCAN) != null) { > > try { > > scan = TableMapReduceUtil.convertStringToScan(conf.get(SCAN)); > > You can use TableMapReduceUtil#convertScanToString() to convert a Scan > which has filter(s) and pass to TableInputFormat > > Cheers > > On Thu, Mar 26, 2015 at 6:46 AM, Stuart Layton <stuart.lay...@gmail.com> > wrote: > >> HBase scans come with the ability to specify filters that make scans very >> fast and efficient (as they let you seek for the keys that pass the filter). >> >> Do RDD's or Spark DataFrames offer anything similar or would I be >> required to use a NoSQL db like HBase to do something like this? >> >> -- >> Stuart Layton >> > > -- Stuart Layton