Below is the output:
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1967947 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 2024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 1967947 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited I have set the max open file to 2024 by ulimit -n 2024, but same issue I am not sure whether it is a reasonable setting. Actually I am doing a loop, each time try to sort only 3GB data, it runs very quick in first loop, and slow down in second loop. At each time loop I start and destroy the context (because I want to clean up the temp file create under tmp folder, which take a lot of space). Just default setting. My logic: For loop: Val sc = new sc Sql = sc.loadParquet Sortbykey Sc.stop End And I run on the EC2 c3*8xlarge, Amazon Linux AMI 2014.09.2 (HVM). From: java8964 [mailto:java8...@hotmail.com] Sent: Friday, March 20, 2015 3:54 PM To: user@spark.apache.org Subject: RE: com.esotericsoftware.kryo.KryoException: java.io.IOException: File too large vs FileNotFoundException (Too many open files) on spark 1.2.1 Do you think the ulimit for the user running Spark on your nodes? Can you run "ulimit -a" under the user who is running spark on the executor node? Does the result make sense for the data you are trying to process? Yong _____ From: szheng.c...@gmail.com To: user@spark.apache.org Subject: com.esotericsoftware.kryo.KryoException: java.io.IOException: File too large vs FileNotFoundException (Too many open files) on spark 1.2.1 Date: Fri, 20 Mar 2015 15:28:26 -0400 Hi All, I try to run a simple sort by on 1.2.1. And it always give me below two errors: 1, 15/03/20 17:48:29 WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 35, ip-10-169-217-47.ec2.internal): java.io.FileNotFoundException: /tmp/spark-e40bb112-3a08-4f62-9eaa-cd094fcfa624/spark-58f72d53-8afc-41c2-ad6 b-e96b479b51f5/spark-fde6da79-0b51-4087-8234-2c07ac6d7586/spark-dd7d6682-19d d-4c66-8aa5-d8a4abe88ca2/16/temp_shuffle_756b59df-ef3a-4680-b3ac-437b5326782 6 (Too many open files) And then I switch to: conf.set("spark.shuffle.consolidateFiles", "true") .set("spark.shuffle.manager", "SORT") Then I get the error: Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 1.0 failed 4 times, most recent failure: Lost task 5.3 in stage 1.0 (TID 36, ip-10-169-217-47.ec2.internal): com.esotericsoftware.kryo.KryoException: java.io.IOException: File too large at com.esotericsoftware.kryo.io.Output.flush(Output.java:157) I roughly know the first issue is because Spark shuffle creates too many local temp files (and I don't know the solution, because looks like my solution also cause other issues), but I am not sure what means is the second error. Anyone knows the solution for both cases? Regards, Shuai