Thanks for the detailed info.

I was hoping that one can find a simpler answer to the Spark version than
doing forensic examination on base code so to speak.

The primer for this verification is that on GCP dataprocs originally built
on 3.11-rc2, there was an issue with running Spark Structured Streaming
(SSS) which I reported to this forum before.

After a while and me reporting to Google, they have now upgraded the base
to Spark 3.1.1 itself. I am not privy to how they did the upgrade itself.

In the meantime we installed 3.1.1 on-premise and ran it with the same
Python code for SSS. It worked fine.

However, when I run the same code on GCP dataproc upgraded to 3.1.1,
occasionally I see this error

21/03/18 16:53:38 ERROR org.apache.spark.scheduler.AsyncEventQueue:
Listener EventLoggingListener threw an exception

java.util.ConcurrentModificationException

        at java.util.Hashtable$Enumerator.next(Hashtable.java:1387)

This may be for other reasons or the consequence of upgrading from
3.1.1-rc2 to 3.11?



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 20 Mar 2021 at 22:41, Attila Zsolt Piros <
piros.attila.zs...@gmail.com> wrote:

> Hi!
>
> I would check out the Spark source then diff those two RCs (first just
> take look to the list of the changed files):
>
> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat
> ...
>
> The shell scripts in the release can be checked very easily:
>
> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".sh "
>  bin/docker-image-tool.sh                           |   6 +-
>  dev/create-release/release-build.sh                |   2 +-
>
> We are lucky as *docker-image-tool.sh* is part of the released version.
> Is it from v3.1.1-rc2 or v3.1.1-rc1?
>
> Of course this only works if docker-image-tool.sh is not changed from
> the v3.1.1-rc2 back to v3.1.1-rc1.
> So let's continue with the python (and latter with R) files:
>
> $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".py "
>  python/pyspark/sql/avro/functions.py               |   4 +-
>  python/pyspark/sql/dataframe.py                    |   1 +
>  python/pyspark/sql/functions.py                    | 285 +++++------
>  .../pyspark/sql/tests/test_pandas_cogrouped_map.py |  12 +
>  python/pyspark/sql/tests/test_pandas_map.py        |   8 +
> ...
>
> After you have enough proof you can stop (to decide what is enough here
> should be decided by you).
> Finally you can use javap / scalap on the classes from the jars and check
> some code changes which is more harder to be analyzed than a simple text
> file.
>
> Best Regards,
> Attila
>
>
> On Thu, Mar 18, 2021 at 4:09 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi
>>
>> What would be a signature in Spark version or binaries that confirms the
>> release is built on Spark built on 3.1.1 as opposed to 3.1.1-RC-1 or RC-2?
>>
>> Thanks
>>
>> Mich
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>

Reply via email to