how to check whether spill over to hard drive happened or not

Zeming Yu Sat, 06 May 2017 21:10:52 -0700

hi,

I'm running pyspark on my local PC using the stand alone mode.


After a pyspark window function on a dataframe, I did a groupby query on
the dataframe.
The groupby query turns out to be very slow (10+ minutes on a small data
set).
I then cached the dataframe and re-ran the same query. The query remained
very slow.

I could also hear noises from the hard drive - I assume the PC is busy
reading and writing from the hard drive. Is this an indication of the data
frame has spilled over to hard drive?

What's the best method for monitoring what's happening? How can I avoid
this from happening?

Thanks!

how to check whether spill over to hard drive happened or not

Reply via email to