Re: PHOENIX SPARK - DataFrame for BulkLoad

Josh Mahonin Wed, 18 May 2016 07:37:28 -0700

Hi,

The Spark integration uses the Phoenix MapReduce framework, which under the
hood translates those to UPSERTs spread across a number of workers.


You should try out both methods and see which works best for your use case.
For what it's worth, we routinely do load / save operations using the Spark
integration on those data sizes.

Josh

On Tue, May 17, 2016 at 7:03 AM, Radha krishna <[email protected]> wrote:

> Hi
>
> I have the same scenario, can you share your metrics like column count for
> each row, number of SALT_BUCKETS, compression technique which you used and
> how much time it is taking to load the complete data.
>
> my scenario is I have to load 1.9 billions of records ( approx 20 files
> data each file contains 100 million rows and 102 columns per each row)
> currently it is taking 35 to 45 minutes to load one file data
>
>
>
> On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj <
> [email protected]> wrote:
>
>> I have 100 million records to be inserted to a HBase table (PHOENIX) as a
>> result of a Spark Job. I would like to know if i convert it to a Dataframe
>> and save it, will it do Bulk load (or) it is not the efficient way to write
>> data to Phoenix HBase table
>>
>> --
>> Thanks and Regards
>> Mohan
>>
>
>
>
> --
>
>
>
>
>
>
>
>
> Thanks & Regards
>    Radha krishna
>
>
>

Re: PHOENIX SPARK - DataFrame for BulkLoad

Reply via email to