Thanks Pari. The frequency of the job is weekly. No. of rows is around 10 billion. Cluster is 13 node. >From what you have mentioned I see that CsvBulkLoadTool is best option for my scenario.
I see you have mentioned about increasing the batch size to accommodate more rows. Are you talking about the 'phoenix.mutate.batchSize' configuration parameter? Vamsi Attluri On Wed, Mar 16, 2016 at 9:01 AM Pariksheet Barapatre <pbarapa...@gmail.com> wrote: > Hi Vamsi, > > How many number of rows your expecting out of your transformation and what > is the frequency of job? > > If there are less number of row (< ~100K and this depends on cluster size > as well), you can go ahead with phoenix-spark plug-in , increase batch > size to accommodate more rows, else use CVSbulkLoader. > > Thanks > Pari > > On 16 March 2016 at 20:03, Vamsi Krishna <vamsi.attl...@gmail.com> wrote: > >> Thanks Gabriel & Ravi. >> >> I have a data processing job wirtten in Spark-Scala. >> I do a join on data from 2 data files (CSV files) and do data >> transformation on the resulting data. Finally load the transformed data >> into phoenix table using Phoenix-Spark plugin. >> On seeing that Phoenix-Spark plugin goes through regular HBase write path >> (writes to WAL), i'm thinking of option 2 to reduce the job execution time. >> >> *Option 2:* Do data transformation in Spark and write the transformed >> data to a CSV file and use Phoenix CsvBulkLoadTool to load data into >> Phoenix table. >> >> Has anyone tried this kind of exercise? Any thoughts. >> >> Thanks, >> Vamsi Attluri >> >> On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran <maghamraviki...@gmail.com> >> wrote: >> >>> Hi Vamsi, >>> The upserts through Phoenix-spark plugin definitely go through WAL . >>> >>> >>> On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <gabriel.r...@gmail.com> >>> wrote: >>> >>>> Hi Vamsi, >>>> >>>> I can't answer your question abotu the Phoenix-Spark plugin (although >>>> I'm sure that someone else here can). >>>> >>>> However, I can tell you that the CsvBulkLoadTool does not write to the >>>> WAL or to the Memstore. It simply writes HFiles and then hands those >>>> HFiles over to HBase, so the memstore and WAL are never >>>> touched/affected by this. >>>> >>>> - Gabriel >>>> >>>> >>>> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <vamsi.attl...@gmail.com> >>>> wrote: >>>> > Team, >>>> > >>>> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore? >>>> > >>>> > Phoenix-Spark plugin: >>>> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore? >>>> > >>>> > Thanks, >>>> > Vamsi Attluri >>>> > -- >>>> > Vamsi Attluri >>>> >>> >>> -- >> Vamsi Attluri >> > > > > -- > Cheers, > Pari > -- Vamsi Attluri