Hi, The Spark integration uses the Phoenix MapReduce framework, which under the hood translates those to UPSERTs spread across a number of workers.
You should try out both methods and see which works best for your use case. For what it's worth, we routinely do load / save operations using the Spark integration on those data sizes. Josh On Tue, May 17, 2016 at 7:03 AM, Radha krishna <[email protected]> wrote: > Hi > > I have the same scenario, can you share your metrics like column count for > each row, number of SALT_BUCKETS, compression technique which you used and > how much time it is taking to load the complete data. > > my scenario is I have to load 1.9 billions of records ( approx 20 files > data each file contains 100 million rows and 102 columns per each row) > currently it is taking 35 to 45 minutes to load one file data > > > > On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj < > [email protected]> wrote: > >> I have 100 million records to be inserted to a HBase table (PHOENIX) as a >> result of a Spark Job. I would like to know if i convert it to a Dataframe >> and save it, will it do Bulk load (or) it is not the efficient way to write >> data to Phoenix HBase table >> >> -- >> Thanks and Regards >> Mohan >> > > > > -- > > > > > > > > > Thanks & Regards > Radha krishna > > >
