Re: Spark HBase Bulk load using HFileFormat

2016-07-14 Thread Ted Yu
Please take a look at http://hbase.apache.org/book.html#dm.sort In your second example, the column qualifier of current cell was not in proper order. On Thu, Jul 14, 2016 at 12:13 PM, yeshwanth kumar wrote: > Hi , > > i have few questions regarding BulkLoad, > does the

Re: Spark HBase Bulk load using HFileFormat

2016-07-14 Thread yeshwanth kumar
Hi , i have few questions regarding BulkLoad, does the Rows needs to be in sorted order or, the KeyValues in the row needs to be in sorted order? sometimes i see exception between two different rowkeys, sometime i see exception between keyvalue pairs of same rowkey. for example current cell

Re: Spark HBase Bulk load using HFileFormat

2016-07-14 Thread yeshwanth kumar
following is the code snippet for saveASHFile def saveAsHFile(putRDD: RDD[(ImmutableBytesWritable, KeyValue)], outputPath: String) = { val conf = ConfigFactory.getConf val job = Job.getInstance(conf, "HBaseBulkPut") job.setMapOutputKeyClass(classOf[ImmutableBytesWritable])