Probably network / shuffling cost? Or broadcast variables? Can you provide more details what you do and some timings?
> On 9. Apr 2018, at 07:07, Junfeng Chen <[email protected]> wrote: > > I have wrote an spark streaming application reading kafka data and convert > the json data to parquet and save to hdfs. > What make me puzzled is, the processing time of app in yarn mode cost 20% to > 50% more time than in local mode. My cluster have three nodes with three node > managers, and all three hosts have same hardware, 40cores and 256GB memory. . > > Why? How to solve it? > > Regard, > Junfeng Chen --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
