Hi,

I do not think that such time comparisons make any sense at all in
distributed computation. Just saying that an operation in RDD and Dataframe
can be compared based on their start and stop time may not provide any
valid information.

You will have to look into the details of timing and the steps. For
example, please look at the SPARK UI to see how timings are calculated in
distributed computing mode, there are several well written papers on this.


Thanks and Regards,
Gourav Sengupta





On Thu, Dec 23, 2021 at 10:57 AM <bit...@bitfox.top> wrote:

> hello community,
>
> In pyspark how can I measure the running time to the command?
> I just want to compare the running time of the RDD API and dataframe
> API, in my this blog:
>
> https://bitfoxtop.wordpress.com/2021/12/23/count-email-addresses-using-sparks-rdd-and-dataframe/
>
> I tried spark.time() it doesn't work.
> Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to