Hi, What is your cluster setup? How mich memory do you have? How much space does one row only consisting of the 3 columns consume? Do you run other stuff in the background?
Best regards Am 04.12.2014 23:57 schrieb "bonnahu" <bonn...@gmail.com>: > I am trying to load a large Hbase table into SPARK RDD to run a SparkSQL > query on the entity. For an entity with about 6 million rows, it will take > about 35 seconds to load it to RDD. Is it expected? Is there any way to > shorten the loading process? I have been getting some tips from > http://hbase.apache.org/book/perf.reading.html to speed up the process, > e.g., scan.setCaching(cacheSize) and only add the necessary > attributes/column to scan. I am just wondering if there are other ways to > improve the speed? > > Here is the code snippet: > > SparkConf sparkConf = new > SparkConf().setMaster("spark://url").setAppName("SparkSQLTest"); > JavaSparkContext jsc = new JavaSparkContext(sparkConf); > Configuration hbase_conf = HBaseConfiguration.create(); > hbase_conf.set("hbase.zookeeper.quorum","url"); > hbase_conf.set("hbase.regionserver.port", "60020"); > hbase_conf.set("hbase.master", "url"); > hbase_conf.set(TableInputFormat.INPUT_TABLE, entityName); > Scan scan = new Scan(); > scan.addColumn(Bytes.toBytes("MetaInfo"), Bytes.toBytes("col1")); > scan.addColumn(Bytes.toBytes("MetaInfo"), Bytes.toBytes("col2")); > scan.addColumn(Bytes.toBytes("MetaInfo"), Bytes.toBytes("col3")); > scan.setCaching(this.cacheSize); > hbase_conf.set(TableInputFormat.SCAN, convertScanToString(scan)); > JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD > = jsc.newAPIHadoopRDD(hbase_conf, > TableInputFormat.class, ImmutableBytesWritable.class, > Result.class); > logger.info("count is " + hBaseRDD.cache().count()); > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Loading-a-large-Hbase-table-into-SPARK-RDD-takes-quite-long-time-tp20396.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >