Hi Andrew,
Hfileoutputformat2 needs the hbase keys to be sorted in lexicographically.
as per your suggestion timestamp + hashed key, i might end up doing a sort
on the rdd.
which i want to avoid.
if i could generate a sequential key, i don't need to do a sort, i could
just write after processin
Hi how bout creating an auto increment column in hbase?
Hth
On 24 Jul 2016 3:53 am, "yeshwanth kumar" wrote:
> Hi,
>
> i am doing bulk load to hbase using spark,
> in which i need to generate a sequential key for each record,
> the key should be sequential across all the executors.
>
> i tried z
If you can use a dataframe then you could use rank + window function at the
expense of an extra sort. Do you have an example of zip with index not
working, that seems surprising.
On Jul 23, 2016 10:24 PM, "Andrew Ehrlich" wrote:
> It’s hard to do in a distributed system. Maybe try generating a me
It’s hard to do in a distributed system. Maybe try generating a meaningful key
using a timestamp + hashed unique key fields in the record?
> On Jul 23, 2016, at 7:53 PM, yeshwanth kumar wrote:
>
> Hi,
>
> i am doing bulk load to hbase using spark,
> in which i need to generate a sequential ke
Hi,
i am doing bulk load to hbase using spark,
in which i need to generate a sequential key for each record,
the key should be sequential across all the executors.
i tried zipwith index, didn't worked because zipwith index gives index per
executor not across all executors.
looking for some sugge