Re: Spark 3 is Slower than Spark 2 for TPCDS Q04 query.

2021-12-19 Thread Senthil Kumar
@abhishek. We use spark 3.1* On Mon, 20 Dec 2021, 09:50 Rao, Abhishek (Nokia - IN/Bangalore), < abhishek@nokia.com> wrote: > Hi Senthil, > > > > Which version of Spark 3 are we using? We had this kind of observation > with Spark 3.0.2 and 3.1.x, but then we figured out that we had configured

RE: Spark 3 is Slower than Spark 2 for TPCDS Q04 query.

2021-12-19 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi Senthil, Which version of Spark 3 are we using? We had this kind of observation with Spark 3.0.2 and 3.1.x, but then we figured out that we had configured big value for spark.network.timeout and this value was not taking effect in all releases prior to 3.0.2. This was fixed as part of https:

[R] SparkR on conda-forge

2021-12-19 Thread Maciej
Hi everyone, FYI ‒ thanks to good folks from conda-forge we have now these: * https://github.com/conda-forge/r-sparkr-feedstock * https://anaconda.org/conda-forge/r-sparkr -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC OpenPGP_signature Description: Op

Spark 3 is Slower than Spark 2 for TPCDS Q04 query.

2021-12-19 Thread Senthil Kumar
Hi All, We are comparing Spark 2.4.5 and Spark 3(without enabling spark 3 additional features) with TPCDS queries and found that Spark 3's performance is reduced to at-least 30-40% compared to Spark 2.4.5. Eg. Data size used 1TB Spark 2.4.5 finishes the Q4 in 1.5 min, but Spark 3.* takes at-lea