Hi
I am attaching the Spark process web Info screenshot,
have a look at screenshot.
1) For A single Map operator why it shows multiple complete
Stages, with same information.
If you don't cache your result and it's needed several time in the
computation, Spark recomputes the Map, and thus it appears several
times.
2) As you can see the Number of Complete workers is more
than Maximum workers (2931/2339). Can you please tell me
why it shows like that ??
Usually it happens when one of your Executor dies (usually from
serious memory exhaustion, but many causes can be found)
Only advice I can give is to watch your logs for ERROR and Exception
3) How a stage is designed in spark As you can see my
code After first Map with groupByKey and filter I am
running one more Map then filter then Count But this spark
Combined these three stages and Named it as Count (you can
see in ScreenShot attached). Can you please explain How
does it combine stages and what is the logic or idea
behind this??
I'll let someone else answer you on that, but basically, you can
trust Spark to optimize this correctly.
Guillaume
--
|
Guillaume
PITEL, Président
+33(0)6 25 48 86 80
eXenSa
S.A.S.
41, rue Périer -
92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37
05
|
|
- Re: Fwd: Some Questions & Doubts regarding Spark proce... Guillaume Pitel
-