Re: Issue while calling foreach in Pyspark

2021-05-08 Thread Sean Owen
It looks like the executor (JVM) stops immediately. Hard to say why - do you have Java installed and a compatible version? I agree it could be a py4j version problem, from that SO link. On Sat, May 8, 2021, 1:35 PM rajat kumar wrote: > Hi Sean/Mich, > > Thanks for response. > > That was the

Re: Issue while calling foreach in Pyspark

2021-05-08 Thread rajat kumar
Hi Sean/Mich, Thanks for response. That was the full log. Sending again for reference. I am just running foreach (lamda) which runs pure python code. Exception in read_logs : Py4JJavaError Traceback (most recent call last): File "/opt/spark/python/lib/python3.6/site-packages/filename.py",

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Mich Talebzadeh
By yarn mode I meant dealing with issues raised in a cluster wide. >From personal experience, I find it easier to trace these sorts of errors when I run the code in local mode as it could be related to the set-up and easier to track where things go wrong when one is dealing with local mode. This

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Sean Owen
I don't see any reason to think this is related to YARN. You haven't shown the actual error @rajat so not sure there is anything to say. On Fri, May 7, 2021 at 3:08 PM Mich Talebzadeh wrote: > I have suspicion that this may be caused by your cluster as it appears > that you are running this in

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Mich Talebzadeh
I have suspicion that this may be caused by your cluster as it appears that you are running this in YARN mode like below spark-submit --master yarn --deploy-mode client xyx.py What happens if you try running it in local mode? spark-submit --master local[2] xyx.py Is this run in a managed

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread rajat kumar
Thanks Mich and Sean for the response . Yes Sean is right. This is a batch job. I am having only 10 records in the dataframe still it is giving this exception Following are the full logs. File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 584, in foreach

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Sean Owen
foreach definitely works :) This is not a streaming question. The error says that the JVM worker died for some reason. You'd have to look at its logs to see why. On Fri, May 7, 2021 at 11:03 AM Mich Talebzadeh wrote: > Hi, > > I am not convinced foreach works even in 3.1.1 > Try doing the same

Re: Issue while calling foreach in Pyspark

2021-05-07 Thread Mich Talebzadeh
Hi, I am not convinced foreach works even in 3.1.1 Try doing the same with foreachBatch foreachBatch(sendToSink). \ trigger(processingTime='2 seconds'). \ and see it works HTH view my Linkedin profile

Issue while calling foreach in Pyspark

2021-05-07 Thread rajat kumar
Hi Team, I am using Spark 2.4.4 with Python While using below line: dataframe.foreach(lambda record : process_logs(record)) My use case is , process logs will download the file from cloud storage using Python code and then it will save the processed data. I am getting the following error