you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase, and do join in Spark SQL.
> On 21 Jun 2017, at 10:17 AM, sunerhan1...@sina.com wrote: > > Hello, > My scenary is like this: > 1.val df=hivecontext/carboncontex.sql("sql....") > 2.iterating rows,extrating two columns,id and mvcc, and use id as key > to scan hbase to get corresponding value > if mvcc==value, this row pass,else drop > Is there a better way except dataframe.mapPartitions because it cause an > extra stage and spend more time. > I put two DAGs in appendix,please check! > > Thanks!! > sunerhan1...@sina.com <mailto:sunerhan1...@sina.com><appendix.zip> > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@spark.apache.org>