Re: bulk-upsert spark phoenix

2016-10-17 Thread Antonio Murgia
Hi Josh, thank for your reply, I'm trying to implement a bulk save to Phoenix with Apache Spark, and the code you linked helped me a lot. I'm now facing an issue with composite primary keys, I cannot find anywhere in the Phoenix code where the row-key is built using the partial phoenix primar

Re: bulk-upsert spark phoenix

2016-09-28 Thread Josh Mahonin
Hi Antonio, Certainly, a JIRA ticket with a patch would be fantastic. Thanks! Josh On Wed, Sep 28, 2016 at 12:08 PM, Antonio Murgia wrote: > Thank you very much for your insights Josh, if I decide to develop a small > Phoenix Library that does, through Spark, what the CSV loader does, I'll >

Re: bulk-upsert spark phoenix

2016-09-28 Thread Antonio Murgia
Thank you very much for your insights Josh, if I decide to develop a small Phoenix Library that does, through Spark, what the CSV loader does, I'll surely write to the mailing list, or open a Jira, or maybe even open a PR, right? Thank you again #A.M. On 09/28/2016 05:10 PM, Josh Mahonin wr

Re: bulk-upsert spark phoenix

2016-09-28 Thread Josh Mahonin
Hi Antonio, You're correct, the phoenix-spark output uses the Phoenix Hadoop OutputFormat under the hood, which effectively does a parallel, batch JDBC upsert. It should scale depending on the number of Spark executors, RDD/DataFrame parallelism, and number of HBase RegionServers, though admittedl

bulk-upsert spark phoenix

2016-09-27 Thread Antonio Murgia
Hi, I would like to perform a Bulk insert to HBase using Apache Phoenix from Spark. I tried using Apache Spark Phoenix library but, as far as I was able to understand from the code, it looks like it performs a jdbc batch of upserts (am I right?). Instead I want to perform a Bulk load like the one