Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-14 Thread Gary Liu
Wonder how to do this? repartition the input dataset to have less >>>> partitions? I used df.rdd.getNumPartitions() to check the input data >>>> partitions, they have 9 and 17 partitions respectively, should I decrease >>>> them further? I also re

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-13 Thread Mich Talebzadeh
metadatafetchfailedexception-when-processing-big-data-se), >>> saying increasing partitions may help.Which one makes more sense? I >>> repartitioned the input data to 20 and 30 partitions, but still no luck. >>> >>> Any suggestions? >>> >>> 23/

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-13 Thread Gary Liu
l no luck. >> >> Any suggestions? >> >> 23/03/10 14:32:19 WARN TaskSetManager: Lost task 58.1 in stage 27.0 (TID >> 3783) (10.1.0.116 executor 33): FetchFailed(BlockManagerId(72, 10.1.15.199, >> 36791, None), shuffleId=24, mapIndex=77, mapId=3457, reduceId=

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-10 Thread Mich Talebzadeh
1 in stage 27.0 (TID > 3783) (10.1.0.116 executor 33): FetchFailed(BlockManagerId(72, 10.1.15.199, > 36791, None), shuffleId=24, mapIndex=77, mapId=3457, reduceId=58, message= > org.apache.spark.shuffle.FetchFailedException > at > org.apache.spark.errors.SparkCoreErrors$.fetchFa

org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-10 Thread Gary Liu
sk 58.1 in stage 27.0 (TID 3783) (10.1.0.116 executor 33): FetchFailed(BlockManagerId(72, 10.1.15.199, 36791, None), shuffleId=24, mapIndex=77, mapId=3457, reduceId=58, message= org.apache.spark.shuffle.FetchFailedException at org.apache.spark.errors.SparkCoreErrors$.fetchF

Re: org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-03 Thread Ryan Blue
>> group by a. >> >> rb >> ​ >> >> On Tue, May 1, 2018 at 4:21 AM, Pralabh Kumar >> wrote: >> >>> Hi >>> >>> I am getting the above error in Spark SQL . I have increase (using 5000 >>> ) number of partitions but

Re: org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-02 Thread Pralabh Kumar
QL . I have increase (using 5000 ) >> number of partitions but still getting the same error . >> >> My data most probably is skew. >> >> >> >> org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829 >> at >> org.ap

Re: org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-01 Thread Ryan Blue
t; number of partitions but still getting the same error . > > My data most probably is skew. > > > > org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829 > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(S

org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-01 Thread Pralabh Kumar
Hi I am getting the above error in Spark SQL . I have increase (using 5000 ) number of partitions but still getting the same error . My data most probably is skew. org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829 at

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-29 Thread 周康
icated to more than one > server? > > thanks > > -- > *From:* jeff saremi > *Sent:* Friday, July 28, 2017 4:38:08 PM > *To:* Juan Rodríguez Hortalá > > *Cc:* user@spark.apache.org > *Subject:* Re: Job keeps aborting because of >

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi
of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232 Thanks Juan for taking the time Here's more info: - This is running on Yarn in Master mode - See config params below - This is a corporate environment. In general nodes should not be added or removed

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi
ark.network.timeout=1000s ^ From: Juan Rodríguez Hortalá Sent: Friday, July 28, 2017 4:20:40 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232 Hi Je

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread Juan Rodríguez Hortalá
Hi Jeff, Can you provide more information about how are you running your job? In particular: - which cluster manager are you using? It is YARN, Mesos, Spark Standalone? - with configuration options are you using to submit the job? In particular are you using dynamic allocation or external shuf

Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi
We have a not too complex and not too large spark job that keeps dying with this error I have researched it and I have not seen any convincing explanation on why I am not using a shuffle service. Which server is the one that is refusing the connection? If I go to the server that is being report

Re: PySpark Issue: "org.apache.spark.shuffle.FetchFailedException: Failed to connect to..."

2016-05-04 Thread HLee
I had the same problem. One forum post elsewhere suggested that too much network communication might be using up available ports. I reduced the partition size via repartition(int) and it solved the problem. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/P

Re: PySpark Issue: "org.apache.spark.shuffle.FetchFailedException: Failed to connect to..."

2016-03-20 Thread craigiggy
Also, this is the command I use to submit the Spark application: ** where *recommendation_engine-0.1-py2.7.egg* is a Python egg of my own library I've written for this application, and *'file'* and *'/home/spark/enigma_analytics/tests/msg-epims0730_small.json'* are input arguments for the applica

Re: PySpark Issue: "org.apache.spark.shuffle.FetchFailedException: Failed to connect to..."

2016-03-19 Thread craigiggy
Slight update I suppose? For some reason, sometimes it will connect and continue and the job will be completed. But most of the time I still run into this error and the job is killed and the application doesn't finish. Still have no idea why this is happening. I could really use some help here.

PySpark Issue: "org.apache.spark.shuffle.FetchFailedException: Failed to connect to..."

2016-03-15 Thread craigiggy
retry attempts, but I still get the same error. The error code that stands out to me is: *org.apache.spark.shuffle.FetchFailedException: Failed to connect to spark-mastr-1:xx* The following is the error that I receive on my most recent attempted run of the application: Traceback (most recent call

org.apache.spark.shuffle.FetchFailedException: Failed to connect to ..... on worker failure

2015-10-27 Thread kundan kumar
. org.apache.spark.shuffle.FetchFailedException: Failed to connect to . Now, when I restart the same worker or (2 workers were running on the machine and I killed just one of them) then the execution resumes and the process is completed. Please help me in understanding why on a worker failure my

Re: org.apache.spark.shuffle.FetchFailedException

2015-08-24 Thread kundan kumar
I am running this query on a data size of 4 billion rows and >> getting org.apache.spark.shuffle.FetchFailedException error. >> >> select adid,position,userid,price >> from ( >> select adid,position,userid,price, >> dense_rank() OVER (PARTITION BY adlocationid

Re: org.apache.spark.shuffle.FetchFailedException

2015-08-24 Thread Raghavendra Pandey
Did you try increasing sql partitions? On Tue, Aug 25, 2015 at 11:06 AM, kundan kumar wrote: > I am running this query on a data size of 4 billion rows and > getting org.apache.spark.shuffle.FetchFailedException error. > > select adid,position,userid,price > from ( > select ad

Re: org.apache.spark.shuffle.FetchFailedException :: Migration from Spark 1.2 to 1.3

2015-05-19 Thread Imran Rashid
erver, 7337), > shuffleId=0, mapId=9, reduceId=1, message= > org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: > Failed to open file: > > /tmp/spark-fff63849-a318-4e48-bdea-2f563076ad5d/spark-40ba3a41-0f4d-446e-b806-e788e210d394/spark-a3d61f7a-22e9-4b3b-934

Re: org.apache.spark.shuffle.FetchFailedException :: Migration from Spark 1.2 to 1.3

2015-05-19 Thread Akhil Das
k 1.2 to > Spark > 1.3 > > 15/05/18 18:22:39 WARN TaskSetManager: Lost task 0.0 in stage 1.6 (TID 84, > cloud8-server): FetchFailed(BlockManagerId(1, cloud4-server, 7337), > shuffleId=0, mapId=9, reduceId=1, message= > org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeExc

org.apache.spark.shuffle.FetchFailedException :: Migration from Spark 1.2 to 1.3

2015-05-18 Thread zia_kayani
ssage= org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Failed to open file: /tmp/spark-fff63849-a318-4e48-bdea-2f563076ad5d/spark-40ba3a41-0f4d-446e-b806-e788e210d394/spark-a3d61f7a-22e9-4b3b-9346-ff3b70d0e43d/blockmgr-0e3b2b5d-f677-4e91-b98b-ed913adbd15f/39/shuffle_0_9_0.index