I believe you can "SELECT version()" in Spark SQL to see the build version.
On Sun, Mar 21, 2021 at 4:41 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks for the detailed info. > > I was hoping that one can find a simpler answer to the Spark version than > doing forensic examination on base code so to speak. > > The primer for this verification is that on GCP dataprocs originally built > on 3.11-rc2, there was an issue with running Spark Structured Streaming > (SSS) which I reported to this forum before. > > After a while and me reporting to Google, they have now upgraded the base > to Spark 3.1.1 itself. I am not privy to how they did the upgrade itself. > > In the meantime we installed 3.1.1 on-premise and ran it with the same > Python code for SSS. It worked fine. > > However, when I run the same code on GCP dataproc upgraded to 3.1.1, > occasionally I see this error > > 21/03/18 16:53:38 ERROR org.apache.spark.scheduler.AsyncEventQueue: > Listener EventLoggingListener threw an exception > > java.util.ConcurrentModificationException > > at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) > > This may be for other reasons or the consequence of upgrading from > 3.1.1-rc2 to 3.11? > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sat, 20 Mar 2021 at 22:41, Attila Zsolt Piros < > piros.attila.zs...@gmail.com> wrote: > >> Hi! >> >> I would check out the Spark source then diff those two RCs (first just >> take look to the list of the changed files): >> >> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat >> ... >> >> The shell scripts in the release can be checked very easily: >> >> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".sh " >> bin/docker-image-tool.sh | 6 +- >> dev/create-release/release-build.sh | 2 +- >> >> We are lucky as *docker-image-tool.sh* is part of the released version. >> Is it from v3.1.1-rc2 or v3.1.1-rc1? >> >> Of course this only works if docker-image-tool.sh is not changed from >> the v3.1.1-rc2 back to v3.1.1-rc1. >> So let's continue with the python (and latter with R) files: >> >> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".py " >> python/pyspark/sql/avro/functions.py | 4 +- >> python/pyspark/sql/dataframe.py | 1 + >> python/pyspark/sql/functions.py | 285 +++++------ >> .../pyspark/sql/tests/test_pandas_cogrouped_map.py | 12 + >> python/pyspark/sql/tests/test_pandas_map.py | 8 + >> ... >> >> After you have enough proof you can stop (to decide what is enough here >> should be decided by you). >> Finally you can use javap / scalap on the classes from the jars and check >> some code changes which is more harder to be analyzed than a simple text >> file. >> >> Best Regards, >> Attila >> >> >> On Thu, Mar 18, 2021 at 4:09 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Hi >>> >>> What would be a signature in Spark version or binaries that confirms the >>> release is built on Spark built on 3.1.1 as opposed to 3.1.1-RC-1 or RC-2? >>> >>> Thanks >>> >>> Mich >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>