Hi

I am attaching the Spark process web Info screenshot, have a look at screenshot.

1) For A single Map operator why it shows multiple complete Stages, with same information.
If you don't cache your result and it's needed several time in the computation, Spark recomputes the Map, and thus it appears several times.
2) As you can see the Number of Complete workers is more than Maximum workers (2931/2339). Can you please tell me why it shows like that ??
Usually it happens when one of your Executor dies (usually from serious memory exhaustion, but many causes can be found)
Only advice I can give is to watch your logs for ERROR and Exception
3) How a stage is designed in spark As you can see my code After first Map with groupByKey and filter I am running one more Map then filter then Count But this spark Combined these three stages and Named it as Count (you can see in ScreenShot attached). Can you please explain How does it combine stages and what is the logic or idea behind this??

I'll let someone else answer you on that, but basically, you can trust Spark to optimize this correctly.

Guillaume
--
eXenSa
Guillaume PITEL, Président
+33(0)6 25 48 86 80

eXenSa S.A.S.
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05

Reply via email to