Hi Jean,
We prepare the data for all another jobs. We have a lot of jobs that
schedule to different time but all of them need to read same raw data.

On Fri, Nov 3, 2017 at 12:49 PM Jean Georges Perrin <jper...@lumeris.com>
wrote:

> Hi Oren,
>
> Why don’t you want to use a GroupBy? You can cache or checkpoint the
> result and use it in your process, keeping everything in Spark and avoiding
> save/ingestion...
>
>
> > On Oct 31, 2017, at 08:17, ⁨אורן שמון⁩ <⁨oren.sha...@gmail.com⁩> wrote:
> >
> > I have 2 spark jobs one is pre-process and the second is the process.
> > Process job needs to calculate for each user in the data.
> > I want  to avoid shuffle like groupBy so I think about to save the
> result of the pre-process as bucket by user in Parquet or to re-partition
> by user and save the result .
> >
> > What is prefer ? and why
> > Thanks in advance,
> > Oren
>
>

Reply via email to