Re: measure running time

2021-12-24 Thread bitfox
Cc user、Luca Canali Subject Re: measure running time Hi Sean, I have already discussed an issue in my case with Spark 3.1.1 and sparkmeasure with the author Luca Canali on this matter. It has been reproduced. I think we ought to wa

Re: measure running time

2021-12-24 Thread Hollis
user、Luca Canali | | Subject | Re: measure running time | Hi Sean, I have already discussed an issue in my case with Spark 3.1.1 and sparkmeasure with the author Luca Canali on this matter. It has been reproduced. I think we ought to wait for a patch. HTH, Mich view my Lin

Re: measure running time

2021-12-24 Thread Mich Talebzadeh
istic >> >>>> approach that may lead you to miss important details, in >> > particular >> >>>> when running distributed computations. >> >>>> >> >>>> WebUI, REST API, and metrics instrumentation in Spark can be quite >>

Re: measure running time

2021-12-24 Thread Sean Owen
hat may lead you to miss important details, in > > particular > >>>> when running distributed computations. > >>>> > >>>> WebUI, REST API, and metrics instrumentation in Spark can be quite > >>>> useful for further drill down. See > >>

Re: measure running time

2021-12-24 Thread Gourav Sengupta
Hi, There are too many blogs out there with absolutely no value. Before writing another blog, which does not make much sense by doing run time comparisons between RDD and dataframes (as stated earlier), it may be useful to first understand what you are trying to achieve by writing this blog. The

Re: measure running time

2021-12-24 Thread bitfox
As you see below: $ pip install sparkmeasure Collecting sparkmeasure Using cached https://files.pythonhosted.org/packages/9f/bf/c9810ff2d88513ffc185e65a3ab9df6121ad5b4c78aa8d134a06177f9021/sparkmeasure-0.14.0-py2.py3-none-any.whl Installing collected packages: sparkmeasure Successfully instal

Re: measure running time

2021-12-24 Thread bitfox
est/monitoring.html You can also have a look at this tool that takes care of automating collecting and aggregating some executor task metrics: https://github.com/LucaCanali/sparkMeasure Best, Luca From: Gourav Sengupta Sent: Thursday, December 23, 2021 14:23 To: bit...@bitfox.top Cc: user

Re:Re: measure running time

2021-12-24 Thread Hollis
ting >>> collecting and aggregating some executor task metrics: >>> https://github.com/LucaCanali/sparkMeasure >>> >>> Best, >>> >>> Luca >>> >>> From: Gourav Sengupta >>> Sent: Thursday, December 23, 2021 1

Re: measure running time

2021-12-23 Thread bitfox
u can also have a look at this tool that takes care of automating collecting and aggregating some executor task metrics: https://github.com/LucaCanali/sparkMeasure Best, Luca From: Gourav Sengupta Sent: Thursday, December 23, 2021 14:23 To: bit...@bitfox.top Cc: user Subject: Re: measure ru

Re: measure running time

2021-12-23 Thread bitfox
://github.com/LucaCanali/sparkMeasure Best, Luca From: Gourav Sengupta Sent: Thursday, December 23, 2021 14:23 To: bit...@bitfox.top Cc: user Subject: Re: measure running time Hi, I do not think that such time comparisons make any sense at all in distributed computation. Just saying that an operation

Re: measure running time

2021-12-23 Thread Mich Talebzadeh
> > > bin/pyspark --packages ch.cern.sparkmeasure:spark-measure_2.12:0.17 > > > > Best, > > Luca > > > > *From:* Mich Talebzadeh > *Sent:* Thursday, December 23, 2021 19:59 > *To:* Luca Canali > *Cc:* user > *Subject:* Re: measure running time

RE: measure running time

2021-12-23 Thread Luca Canali
a look at this tool that takes care of automating collecting and aggregating some executor task metrics: https://github.com/LucaCanali/sparkMeasure Best, Luca From: Gourav Sengupta mailto:gourav.sengu...@gmail.com> > Sent: Thursday, December 23, 2021 14:23 To: bit...@bitfox.top Cc: u

Re: measure running time

2021-12-23 Thread Mich Talebzadeh
d aggregating some executor task metrics: > https://github.com/LucaCanali/sparkMeasure > > > > Best, > > Luca > > > > *From:* Gourav Sengupta > *Sent:* Thursday, December 23, 2021 14:23 > *To:* bit...@bitfox.top > *Cc:* user > *Subject:* Re: measure

RE: measure running time

2021-12-23 Thread Luca Canali
To: bit...@bitfox.top Cc: user Subject: Re: measure running time Hi, I do not think that such time comparisons make any sense at all in distributed computation. Just saying that an operation in RDD and Dataframe can be compared based on their start and stop time may not provide any valid

Re: measure running time

2021-12-23 Thread Gourav Sengupta
Hi, I do not think that such time comparisons make any sense at all in distributed computation. Just saying that an operation in RDD and Dataframe can be compared based on their start and stop time may not provide any valid information. You will have to look into the details of timing and the ste

Re: measure running time

2021-12-23 Thread Mich Talebzadeh
Try this simple thing first import time def main(): start_time = time.time() print("\nStarted at");uf.println(lst) # your code print("\nFinished at");uf.println(lst) end_time = time.time() time_elapsed = (end_time - start_time) print(f"""Elapsed time in seconds is {time_ela

measure running time

2021-12-23 Thread bitfox
hello community, In pyspark how can I measure the running time to the command? I just want to compare the running time of the RDD API and dataframe API, in my this blog: https://bitfoxtop.wordpress.com/2021/12/23/count-email-addresses-using-sparks-rdd-and-dataframe/ I tried spark.time() it doe