OutOfMemoryError

2015-10-04 Thread Ramkumar V
Hi, When i submit java spark job in cluster mode, i'm getting following exception. *LOG TRACE :* INFO yarn.ExecutorRunnable: Setting up executor with commands: List({{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m, -Xmx1024m, -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.ui

OutOfMemoryError

2021-06-30 Thread javaguy Java
Hi, I'm getting Java OOM errors even though I'm setting my driver memory to 24g and I'm executing against local[*] I was wondering if anyone can give me any insight. The server this job is running on has more than enough memory as does the spark driver. The final result does write 3 csv files t

OutOfMemoryError

2017-05-02 Thread TwUxTLi51Nus
t receive any answers until now. Thanks! [0] https://gist.github.com/TwUxTLi51Nus/4accdb291494be9201abfad72541ce74 [1] http://stackoverflow.com/questions/43637913/apache-spark-outofmemoryerror-heapspace PS: As a workaround, I have been writing and reading temporary parquet files on

OutOfMemoryError

2017-06-23 Thread Tw UxTLi51Nus
y answers until now. Thanks! [0] https://gist.github.com/TwUxTLi51Nus/4accdb291494be9201abfad72541ce74 [1] http://stackoverflow.com/questions/43637913/apache-spark-outofmemoryerror-heapspace PS: As a workaround, I have been using "checkpoint" after every few iterations. --

PCA OutOfMemoryError

2016-01-12 Thread Bharath Ravi Kumar
We're running PCA (selecting 100 principal components) on a dataset that has ~29K columns and is 70G in size stored in ~600 parts on HDFS. The matrix in question is mostly sparse with tens of columns populate in most rows, but a few rows with thousands of columns populated. We're running spark on m

Re: OutOfMemoryError

2015-10-05 Thread Jean-Baptiste Onofré
Hi Ramkumar, did you try to increase Xmx of the workers ? Regards JB On 10/05/2015 08:56 AM, Ramkumar V wrote: Hi, When i submit java spark job in cluster mode, i'm getting following exception. *LOG TRACE :* INFO yarn.ExecutorRunnable: Setting up executor with commands: List({{JAVA_HOME}}/b

Re: OutOfMemoryError

2015-10-05 Thread Ramkumar V
No. I didn't try to increase xmx. *Thanks*, On Mon, Oct 5, 2015 at 1:36 PM, Jean-Baptiste Onofré wrote: > Hi Ramkumar, > > did you try to increase Xmx of the workers ? > > Regards > JB > > On 10/05/2015 08:56 AM, Ramkumar V wrote: > >> Hi, >> >> When i

Re: OutOfMemoryError

2015-10-09 Thread Ramkumar V
How to increase the Xmx of the workers ? *Thanks*, On Mon, Oct 5, 2015 at 3:48 PM, Ramkumar V wrote: > No. I didn't try to increase xmx. > > *Thanks*, > > > > On Mon, Oct 5, 2015 at 1:36 PM, Jean-Baptiste Onofr

Re: OutOfMemoryError

2015-10-09 Thread Ted Yu
You can add it in in conf/spark-defaults.conf # spark.executor.extraJavaOptions -XX:+PrintGCDetails FYI On Fri, Oct 9, 2015 at 3:07 AM, Ramkumar V wrote: > How to increase the Xmx of the workers ? > > *Thanks*, > > > > On Mon, Oct 5, 2015 at 3:48 PM,

Re: OutOfMemoryError

2021-07-01 Thread Sean Owen
You need to set driver memory before the driver starts, on the CLI or however you run your app, not in the app itself. By the time the driver starts to run your app, its heap is already set. On Thu, Jul 1, 2021 at 12:10 AM javaguy Java wrote: > Hi, > > I'm getting Java OOM errors even though I'm

Re: OutOfMemoryError

2021-07-06 Thread javaguy Java
Hi Sean, thx for the tip. I'm just running my app via spark-submit on CLI ie >spark-submit --class X --master local[*] assembly.jar so I'll now add to CLI args ie: spark-submit --class X --master local[*] --driver-memory 8g assembly.jar etc. Unless I have this wrong? Thx On Thu, Jul 1, 2021 at

Re: OutOfMemoryError

2021-07-06 Thread Mich Talebzadeh
Personally rather than Parameters here: val spark = SparkSession .builder .master("local[*]") .appName("OOM") .config("spark.driver.host", "localhost") .config("spark.driver.maxResultSize", "0") .config("spark.sql.caseSensitive", "false") .config("spark.sql.adaptive.enabled", "true

SparkSql OutOfMemoryError

2014-10-28 Thread Zhanfeng Huo
Hi,friends: I use spark(spark 1.1) sql operate data in hive-0.12, and the job fails when data is large. So how to tune it ? spark-defaults.conf: spark.shuffle.consolidateFiles true spark.shuffle.manager SORT spark.akka.threads 4 spark.sql.inMemoryColumnarStorage.compressed true

broadcast: OutOfMemoryError

2014-12-11 Thread ll
ache-spark-user-list.1001560.n3.nabble.com/broadcast-OutOfMemoryError-tp20633.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additiona

Re: PCA OutOfMemoryError

2016-01-12 Thread Bharath Ravi Kumar
Any suggestion/opinion? On 12-Jan-2016 2:06 pm, "Bharath Ravi Kumar" wrote: > We're running PCA (selecting 100 principal components) on a dataset that > has ~29K columns and is 70G in size stored in ~600 parts on HDFS. The > matrix in question is mostly sparse with tens of columns populate in mos

Re: PCA OutOfMemoryError

2016-01-13 Thread Alex Gittens
The PCA.fit function calls the RowMatrix PCA routine, which attempts to construct the covariance matrix locally on the driver, and then computes the SVD of that to get the PCs. I'm not sure what's causing the memory error: RowMatrix.scala:124 is only using 3.5 GB of memory (n*(n+1)/2 with n=29604 a

Re: PCA OutOfMemoryError

2016-01-17 Thread Bharath Ravi Kumar
Hello Alex, Thanks for the response. There isn't much other data on the driver, so the issue is probably inherent to this particular PCA implementation. I'll try the alternative approach that you suggested instead. Thanks again. -Bharath On Wed, Jan 13, 2016 at 11:24 PM, Alex Gittens wrote: >

OutOfMemoryError OOM ByteArrayOutputStream.hugeCapacity

2015-10-12 Thread Alexander Pivovarov
I have one job which fails if I enable KryoSerializer I use spark 1.5.0 on emr-4.1.0 Settings: spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 1024m spark.executor.memory47924M spark.yarn.executor.memoryOverhead 5324 The j

Re: SparkSql OutOfMemoryError

2014-10-28 Thread Yanbo Liang
Try to increase the driver memory. 2014-10-28 17:33 GMT+08:00 Zhanfeng Huo : > Hi,friends: > > I use spark(spark 1.1) sql operate data in hive-0.12, and the job fails > when data is large. So how to tune it ? > > spark-defaults.conf: > > spark.shuffle.consolidateFiles true > spark.shuffle

Re: broadcast: OutOfMemoryError

2014-12-11 Thread Sameer Farooqui
ray. what is the best way to handle this? > > should i split the array into smaller arrays before broadcasting, and then > combining them locally at each node? > > thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.

OutOfMemoryError - When saving Word2Vec

2016-06-12 Thread sharad82
org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131) at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:172) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-When-saving-Word2Vec-tp27142.html Sent from the Apache Spark User List mailing

OutOfMemoryError while running job...

2016-12-06 Thread Kevin Burton
I am trying to run a Spark job which reads from ElasticSearch and should write it's output back to a separate ElasticSearch index. Unfortunately I keep getting `java.lang.OutOfMemoryError: Java heap space` exceptions. I've tried running it with: --conf spark.memory.offHeap.enabled=true --conf spark

[Structured Streaming] HDFSBackedStateStoreProvider OutOfMemoryError

2018-03-30 Thread ahmed alobaidi
Hi All, I'm working on simple structured streaming query that uses flatMapGroupsWithState to maintain relatively a large size state. After running the application for few minutes on my local machine, it starts to slow down and then crashes with OutOfMemoryError. Tracking the code led

OutofMemoryError: Java heap space

2015-02-09 Thread Yifan LI
Hi, I just found the following errors during computation(graphx), anyone has ideas on this? thanks so much! (I think the memory is sufficient, spark.executor.memory 30GB ) 15/02/09 00:37:12 ERROR Executor: Exception in task 162.0 in stage 719.0 (TID 7653) java.lang.OutOfMemoryError: Java hea

Spark program thows OutOfMemoryError

2014-04-15 Thread Qin Wei
park.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://192.168.2.184:7077 Is there anybody who can help me? Thanks very much!! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-tp4268.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

OutOfMemoryError during reduce tasks

2015-03-19 Thread Balazs Meszaros
ut data. There is always an OutOfMemoryError at the end of the reduce tasks [2] when I'm using a 1g input while 100m of data don't make a problem. Spark is v1.2.1 (but with v1.3 I'm having the same problem) and it runs on a VM with Ubuntu 14.04, 8G RAM and 4VCPU. (If something el

Spark - Timeout Issues - OutOfMemoryError

2015-04-28 Thread ๏̯͡๏
I have a SparkApp that runs completes in 45 mins for 5 files (5*750MB size) and it takes 16 executors to do so. I wanted to run it against 10 files of each input type (10*3 files as there are three inputs that are transformed). [Input1 = 10*750 MB, Input2=10*2.5GB, Input3 = 10*1.5G], Hence i used

OutofMemoryError when generating output

2014-08-26 Thread SK
olExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutofMemoryError-when-generating-output-tp12847.html Sent from the Apach

Re: Re: SparkSql OutOfMemoryError

2014-10-28 Thread Zhanfeng Huo
It works, thanks very much Zhanfeng Huo From: Yanbo Liang Date: 2014-10-28 18:50 To: Zhanfeng Huo CC: user Subject: Re: SparkSql OutOfMemoryError Try to increase the driver memory. 2014-10-28 17:33 GMT+08:00 Zhanfeng Huo : Hi,friends: I use spark(spark 1.1) sql operate data in hive-0.12

[Beginner Debug]: Executor OutOfMemoryError

2024-02-22 Thread Shawn Ligocki
Hi I'm new to Spark and I'm running into a lot of OOM issues while trying to scale up my first Spark application. I am running into these issues with only 1% of the final expected data size. Can anyone help me understand how to properly configure Spark to use limited memory or how to debug which pa

Re: OutOfMemoryError - When saving Word2Vec

2016-06-12 Thread vaquar khan
Hi Sharad. The array size you (or the serializer) tries to allocate is just too big for the JVM. You can also split your input further by increasing parallelism. Following is good explanintion https://plumbr.eu/outofmemoryerror/requested-array-size-exceeds-vm-limit regards, Vaquar khan On

Re: OutOfMemoryError - When saving Word2Vec

2016-06-13 Thread sharad82
Is this the right forum to post Spark related issues ? I have tried this forum along with StackOverflow but not seeing any response. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-When-saving-Word2Vec-tp27142p27151.html Sent from the

Re: OutOfMemoryError - When saving Word2Vec

2016-06-13 Thread Yuhao Yang
> View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-When-saving-Word2Vec-tp27142p27151.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-03 Thread Dean Wampler
How big is the data you're returning to the driver with collectAsMap? You are probably running out of memory trying to copy too much data back to it. If you're trying to force a map-side join, Spark can do that for you in some cases within the regular DataFrame/RDD context. See http://spark.apache

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-03 Thread ๏̯͡๏
Hello Dean & Others, Thanks for your suggestions. I have two data sets and all i want to do is a simple equi join. I have 10G limit and as my dataset_1 exceeded that it was throwing OOM error. Hence i switched back to use .join() API instead of map-side broadcast join. I am repartitioning the data

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-03 Thread Dean Wampler
IMHO, you are trying waaay to hard to optimize work on what is really a small data set. 25G, even 250G, is not that much data, especially if you've spent a month trying to get something to work that should be simple. All these errors are from optimization attempts. Kryo is great, but if it's not w

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-03 Thread ๏̯͡๏
Hello Deam, If I don;t use Kryo serializer i got Serialization error and hence am using it. If I don';t use partitionBy/reparition then the simply join never completed even after 7 hours and infact as next step i need to run it against 250G as that is my full dataset size. Someone here suggested to

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-03 Thread Dean Wampler
I don't know the full context of what you're doing, but serialization errors usually mean you're attempting to serialize something that can't be serialized, like the SparkContext. Kryo won't help there. The arguments to spark-submit you posted previously look good: 2) --num-executors 96 --driver

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread ๏̯͡๏
Hello Dean & Others, Thanks for the response. I tried with 100,200, 400, 600 and 1200 repartitions with 100,200,400 and 800 executors. Each time all the tasks of join complete in less than a minute except one and that one tasks runs forever. I have a huge cluster at my disposal. The data for each

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread Saisai Shao
IMHO If your data or your algorithm is prone to data skew, I think you have to fix this from application level, Spark itself cannot overcome this problem (if one key has large amount of values), you may change your algorithm to choose another shuffle key, somethings like this to avoid shuffle on sk

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread ๏̯͡๏
Hello Shao, Can you talk more about shuffle key or point me to APIs that allow me to change shuffle key. I will try with different keys and see the performance. What is the shuffle key by default ? On Mon, May 4, 2015 at 2:37 PM, Saisai Shao wrote: > IMHO If your data or your algorithm is prone

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread Saisai Shao
Shuffle key is depending on your implementation, I'm not sure if you are familiar with MapReduce, the mapper output is a key-value pair, where the key is the shuffle key for shuffling, Spark is also the same. 2015-05-04 17:31 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) : > Hello Shao, > Can you talk more about shuff

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread ๏̯͡๏
One dataset (RDD Pair) val lstgItem = listings.map { lstg => (lstg.getItemId().toLong, lstg) } Second Dataset (RDDPair) val viEvents = viEventsRaw.map { vi => (vi.get(14).asInstanceOf[Long], vi) } As i want to join based on item Id that is used as first element in the tuple in both cases and i

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread ๏̯͡๏
Four tasks are now failing with IndexIDAttemptStatus ▾Locality LevelExecutor ID / HostLaunch TimeDurationGC TimeShuffle Read Size / RecordsShuffle Spill (Memory)Shuffle Spill (Disk) Errors 0 3771 0 FAILED PROCESS_LOCAL 114 / host1 2015/05/04 01:27:44 / ExecutorLostFailure (executor 114 lost)

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread Saisai Shao
>From the symptoms you mentioned that one task's shuffle write is much larger than all the other task, it is quite similar to normal data skew behavior, I just give some advice based on your descriptions, I think you need to detect whether data is actually skewed or not. The shuffle will put data

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread ๏̯͡๏
I ran it against one file instead of 10 files and i see one task is still running after 33 mins its shuffle read size is 780MB/50 mil records. I did a count of records for each itemId from dataset-2 [One FILE] (Second Dataset (RDDPair) val viEvents = viEventsRaw.map { vi => (vi.get(14 ).asInstance

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread ๏̯͡๏
I tried this val viEventsWithListings: RDD[(Long, (DetailInputRecord, VISummary, Long))] = lstgItem.join(viEvents, new org.apache.spark.RangePartitioner(partitions = 1200, rdd = viEvents)).map { It fired two jobs and still i have 1 task that never completes. IndexIDAttemptStatusLocality LevelExe

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-04 Thread ๏̯͡๏
Data Set 1 : viEvents : Is the event activity data of 1 day. I took 10 files out of it and 10 records *Item ID Count* 201335783004 3419 191568402102 1793 111654479898 1362 181503913062 1310 261798565828 1028 111654493548 994 231516683056 862 131497785968

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Akhil Das
You could try increasing the driver memory. Also, can you be more specific about the data volume? Thanks Best Regards On Mon, Feb 9, 2015 at 3:30 PM, Yifan LI wrote: > Hi, > > I just found the following errors during computation(graphx), anyone has > ideas on this? thanks so much! > > (I think

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Yifan LI
Hi Akhil, Excuse me, I am trying a random-walk algorithm over a not that large graph(~1GB raw dataset, including ~5million vertices and ~60million edges) on a cluster which has 20 machines. And, the property of each vertex in graph is a hash map, of which size will increase dramatically during

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Akhil Das
Did you have a chance to look at this doc http://spark.apache.org/docs/1.2.0/tuning.html Thanks Best Regards On Tue, Feb 10, 2015 at 4:13 PM, Yifan LI wrote: > Hi Akhil, > > Excuse me, I am trying a random-walk algorithm over a not that large > graph(~1GB raw dataset, including ~5million vertic

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Yifan LI
Yes, I have read it, and am trying to find some way to do that… Thanks :) Best, Yifan LI > On 10 Feb 2015, at 12:06, Akhil Das wrote: > > Did you have a chance to look at this doc > http://spark.apache.org/docs/1.2.0/tuning.html > > > Than

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Kelvin Chu
Since the stacktrace shows kryo is being used, maybe, you could also try increasing spark.kryoserializer.buffer.max.mb. Hope this help. Kelvin On Tue, Feb 10, 2015 at 1:26 AM, Akhil Das wrote: > You could try increasing the driver memory. Also, can you be more specific > about the data volume?

Re: OutofMemoryError: Java heap space

2015-02-12 Thread Yifan LI
Thanks, Kelvin :) The error seems to disappear after I decreased both spark.storage.memoryFraction and spark.shuffle.memoryFraction to 0.2 And, some increase on driver memory. Best, Yifan LI > On 10 Feb 2015, at 18:58, Kelvin Chu <2dot7kel...@gmail.com> wrote: > > Since the stacktrace

OutOfMemoryError when loading input file

2014-03-01 Thread Yonathan Perez
Hello, I'm trying to run a simple test program that loads a large file (~12.4GB) into memory of a single many-core machine. The machine I'm using has more than enough memory (1TB RAM) and 64 cores (of which I want to use 16 for worker threads). Even though I set both the executor memory (spark.exe

Re: Spark program thows OutOfMemoryError

2014-04-16 Thread Andre Bois-Crettez
Seem you have not enough memory on the spark driver. Hints below : On 2014-04-15 12:10, Qin Wei wrote: val resourcesRDD = jsonRDD.map(arg => arg.get("rid").toString.toLong).distinct // the program crashes at this line of code val bcResources = sc.broadcast(resourcesRDD.collect.to

Re: Spark program thows OutOfMemoryError

2014-04-17 Thread yypvsxf19870706
cartesian(resourceScoresRDD).filter(arg => > arg._1._1 > arg._2._1).map(arg => (arg._1._1, arg._2._1, 0.8)) > > simRDD.saveAsTextFile("/home/deployer/sim") > } > > I ran the program through "java -jar myjar.jar", it crashed quickly,

Re: Spark - Timeout Issues - OutOfMemoryError

2015-04-30 Thread Akhil Das
You could try increasing your heap space explicitly. like export _JAVA_OPTIONS="-Xmx10g", its not the correct approach but try. Thanks Best Regards On Tue, Apr 28, 2015 at 10:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > I have a SparkApp that runs completes in 45 mins for 5 files (5*750MB > size) and it takes

Re: Spark - Timeout Issues - OutOfMemoryError

2015-04-30 Thread ๏̯͡๏
Did not work. Same problem. On Thu, Apr 30, 2015 at 1:28 PM, Akhil Das wrote: > You could try increasing your heap space explicitly. like export > _JAVA_OPTIONS="-Xmx10g", its not the correct approach but try. > > Thanks > Best Regards > > On Tue, Apr 28, 2015 at 10:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) > wro

Re: Spark - Timeout Issues - OutOfMemoryError

2015-04-30 Thread ๏̯͡๏
My Spark Job is failing and i see == 15/04/30 09:59:49 ERROR yarn.ApplicationMaster: User class threw exception: Job aborted due to stage failure: Exception while getting task result: org.apache.spark.SparkException: Error sending message [message = GetLocations(taskr

Re: Spark - Timeout Issues - OutOfMemoryError

2015-04-30 Thread ๏̯͡๏
Full Exception *15/04/30 09:59:49 INFO scheduler.DAGScheduler: Stage 1 (collectAsMap at VISummaryDataProvider.scala:37) failed in 884.087 s* *15/04/30 09:59:49 INFO scheduler.DAGScheduler: Job 0 failed: collectAsMap at VISummaryDataProvider.scala:37, took 1093.418249 s* 15/04/30 09:59:49 ERROR yarn

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-02 Thread Akhil Das
You could try repartitioning your listings RDD, also doing a collectAsMap would basically bring all your data to driver, in that case you might want to set the storage level as Memory and disk not sure that will do any help on the driver though. Thanks Best Regards On Thu, Apr 30, 2015 at 11:10 P

Re: OutofMemoryError when generating output

2014-08-26 Thread Burak Yavuz
ine.split("\t") (fields(11), fields(6)) // extract (month, user_id) }.distinct().countByKey() instead Best, Burak - Original Message - From: "SK" To: u...@spark.incubator.apache.org Sent: Tuesday, August 26, 2014 12:38:00 PM Subject: OutofMemoryError when generating

Re: OutofMemoryError when generating output

2014-08-28 Thread SK
object to an object that I can output to console and to a file? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutofMemoryError-when-generating-output-tp12847p13056.html Sent from the Apache Spark User List mailing list archive

Re: OutofMemoryError when generating output

2014-08-28 Thread Burak Yavuz
ot;SK" To: u...@spark.incubator.apache.org Sent: Thursday, August 28, 2014 12:45:22 PM Subject: Re: OutofMemoryError when generating output Hi, Thanks for the response. I tried to use countByKey. But I am not able to write the output to console or to a file. Neither collect() nor saveAsTextF

Re: OutOfMemoryError with basic kmeans

2014-09-17 Thread st553
001560.n3.nabble.com/OutOfMemoryError-with-basic-kmeans-tp1651p14472.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mai

Re: [Beginner Debug]: Executor OutOfMemoryError

2024-02-23 Thread Mich Talebzadeh
Seems like you are having memory issues. Examine your settings. 1. It appears that your driver memory setting is too high. It should be a fraction of total memy provided by YARN 2. Use the Spark UI to monitor the job's memory consumption. Check the Storage tab to see how memory is be

What happens to this RDD? OutOfMemoryError

2015-09-04 Thread Kevin Mandich
Hi All, I'm using PySpark to create a corpus of labeled data points. I create an RDD called corpus, and then join to this RDD each newly-created feature RDD as I go. My code repeats something like this for each feature: feature = raw_data_rdd.map(...).reduceByKey(...).map(...) # create feature RD

OutOfMemoryError When Reading Many json Files

2015-10-13 Thread SLiZn Liu
Hey Spark Users, I kept getting java.lang.OutOfMemoryError: Java heap space as I read a massive amount of json files, iteratively via read.json(). Even the result RDD is rather small, I still get the OOM Error. The brief structure of my program reads as following, in psuedo-code: file_path_list.m

Getting OutOfMemoryError and Worker.run caught exception

2014-12-17 Thread A.K.M. Ashrafuzzaman
Hi guys, Getting the following errors, 2014-12-17 09:05:02,391 [SocialInteractionDAL.scala:Executor task launch worker-110:20] - --- Inserting into mongo - 2014-12-17 09:05:06,768 [ Logging.scala:Executor task launch worker-110:96] - Exception in task 1.0 in stage 1

Re: OutOfMemoryError when loading input file

2014-03-03 Thread Yonathan Perez
context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-when-loading-input-file-tp2213p2246.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Streaming job having Cassandra query : OutOfMemoryError

2014-04-15 Thread sonyjv
Hi All, I am desperately looking for some help. My cluster is 6 nodes having dual core and 8GB ram each. Spark version running on the cluster is spark-0.9.0-incubating-bin-cdh4. I am getting OutOfMemoryError when running a Spark Streaming job (non-streaming version works fine) which queries

Re: Re: Spark program thows OutOfMemoryError

2014-04-17 Thread Qin Wei
eAsTextFile("/home/deployer/sim")} I ran the program through "java -jar myjar.jar", it crashed quickly, but it succeed when the size of the data file is small. Thanks for your help! qinwei  From: Andre Bois-Crettez [via Apache Spark User List]Date: 2014-04-16 17:50To:  Qin WeiSubj

Re: OutOfMemoryError When Reading Many json Files

2015-10-13 Thread Deenar Toraskar
Hi Why dont you check if you can just process the large file standalone and then do the outer loop next. sqlContext.read.json(jsonFile) .select($"some", $"fields") .withColumn( "new_col", some_transformations($"col")) .rdd.map( x: Row => (k, v) ) .combineByKey() Deenar On 14 October 2015 at 05:

Re: OutOfMemoryError When Reading Many json Files

2015-10-14 Thread SLiZn Liu
Yes it went wrong when processing a large file only. I removed transformations on DF, and it worked just fine. But doing a simple filter operation on the DF became the last straw that breaks the camel’s back. That’s confusing. ​ On Wed, Oct 14, 2015 at 2:11 PM Deenar Toraskar wrote: > Hi > > Why

[Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-23 Thread Behroz Sikander
Hello, Spark version: 1.6.2 Hadoop: 2.6.0 Cluster: All VMS are deployed on AWS. 1 Master (t2.large) 1 Secondary Master (t2.large) 5 Workers (m4.xlarge) Zookeeper (t2.large) Recently, 2 of our workers went down with out of memory exception. > java.lang.OutOfMemoryError: GC overhead limit exceeded

[Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-24 Thread bsikander
ry ? How can we avoid that in future. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Worker-Crashing-OutOfMemoryError-GC-overhead-limit-execeeded-tp28535.html Sent from the Apache Spark User List mailing list archiv

Re: Getting OutOfMemoryError and Worker.run caught exception

2014-12-17 Thread Akhil Das
You can go through this doc for tuning http://spark.apache.org/docs/latest/tuning.html Looks like you are creating a lot of objects and the JVM is spending more time clearing these. If you can paste the code snippet, then it will be easy to understand whats happening. Thanks Best Regards On Thu,

Re: [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-24 Thread Yong Zhang
I am not 100% sure, but normally "dispatcher-event-loop" OOM means the driver OOM. Are you sure your workers OOM? Yong From: bsikander Sent: Friday, March 24, 2017 5:48 AM To: user@spark.apache.org Subject: [Worker Crashing] OutOfMemoryError: G

Re: [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-24 Thread Behroz Sikander
re you sure your workers OOM? > > > Yong > > > -- > *From:* bsikander > *Sent:* Friday, March 24, 2017 5:48 AM > *To:* user@spark.apache.org > *Subject:* [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded > > Spark version: 1.6.2

Re: [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-24 Thread Yong Zhang
: [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded Thank you for the response. Yes, I am sure because the driver was working fine. Only 2 workers went down with OOM. Regards, Behroz On Fri, Mar 24, 2017 at 2:12 PM, Yong Zhang mailto:java8...@hotmail.com>> wrote: I am not 100% sur

Re: [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-24 Thread Behroz Sikander
t; *Sent:* Friday, March 24, 2017 9:15 AM > *To:* Yong Zhang > *Cc:* user@spark.apache.org > *Subject:* Re: [Worker Crashing] OutOfMemoryError: GC overhead limit > execeeded > > Thank you for the response. > > Yes, I am sure because the driver was working fine. Only 2 workers

Re: [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded

2017-03-24 Thread Yong Zhang
apache.org Subject: Re: [Worker Crashing] OutOfMemoryError: GC overhead limit execeeded Yea we also didn't find anything related to this online. Are you aware of any memory leaks in worker in 1.6.2 spark which might be causing this ? Do you know of any documentation which explains all the tasks t

OutOfMemoryError with ramdom forest and small training dataset

2015-02-11 Thread poiuytrez
ActorSystem [sparkDriver] java.lang.OutOfMemoryError: Java heap space That's very weird. Any idea of what's wrong with my configuration? PS : I am running Spark 1.2 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest

OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread SLiZn Liu
workers set by -Xms4096M -Xmx4096M, which I presume sufficient for this trivial query. Additionally, after restarted the spark-shell and re-run the limit 5 query , the df object is returned and can be printed by df.show(), but other APIs fails on OutOfMemoryError, namely, df.count(), df.select("

OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread Todd Leo
workers set by -Xms4096M -Xmx4096M, which I presume sufficient for this trivial query. Additionally, after restarted the spark-shell and re-run the limit 5 query , the df object is returned and can be printed by df.show(), but other APIs fails on OutOfMemoryError, namely, df.count(), df.select("

Lost tasks due to OutOfMemoryError (GC overhead limit exceeded)

2016-01-12 Thread Barak Yaish
strationRequired","true"); sparkConf.set("spark.kryoserializer.buffer.max.mb","512"); sparkConf.set("spark.default.parallelism","300"); sparkConf.set("spark.rpc.askTimeout","500"); I'm trying to load data from hdfs and running some sql

How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-25 Thread Mike Trienis
Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception

Spark-xml - OutOfMemoryError: Requested array size exceeds VM limit

2016-11-15 Thread Arun Patel
I am trying to read an XML file which is 1GB is size. I am getting an error 'java.lang.OutOfMemoryError: Requested array size exceeds VM limit' after reading 7 partitions in local mode. In Yarn mode, it throws 'java.lang.OutOfMemoryError: Java heap space' error after reading 3 partitions. Any su

Driver OutOfMemoryError in MapOutputTracker$.serializeMapStatuses for 40 TB shuffle.

2018-09-07 Thread Harel Gliksman
Hi, We are running a Spark (2.3.1) job on an EMR cluster with 500 r3.2xlarge (60 GB, 8 vcores, 160 GB SSD ). Driver memory is set to 25GB. It processes ~40 TB of data using aggregateByKey in which we specify numPartitions = 300,000. Map side tasks succeed, but reduce side tasks all fail. We noti

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-11 Thread poiuytrez
true spark.eventLog.dir gs://-spark/spark-eventlog-base/spark-m spark.executor.memory 83971m spark.yarn.executor.memoryOverhead 83971m I am using spark-submit. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-11 Thread poiuytrez
spark.eventLog.dir gs://databerries-spark/spark-eventlog-base/spark-m spark.executor.memory 83971m spark.yarn.executor.memoryOverhead 83971m I am using spark-submit. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and-small

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-12 Thread didmar
Ok, I would suggest adding SPARK_DRIVER_MEMORY in spark-env.sh, with a larger amount of memory than the default 512m -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and-small-training-dataset-tp21598p21618.html Sent from

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-12 Thread poiuytrez
) but the memory is not correctly allocated as we can see on the webui executor page). I am going to file an issue in the bug tracker. Thank you for your help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and-small

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-12 Thread Sean Owen
help. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and-small-training-dataset-tp21598p21620.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ---

Re: OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread Ted Yu
ume sufficient for this trivial query. > > Additionally, after restarted the spark-shell and re-run the limit 5 query , > the df object is returned and can be printed by df.show(), but other APIs > fails on OutOfMemoryError, namely, df.count(), df.select("some_field"

Unable to Hive program from Spark Programming Guide (OutOfMemoryError)

2015-03-25 Thread ๏̯͡๏
http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables I modified the Hive query but run into same error. ( http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables) val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql("CREAT

Re: OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread Michael Armbrust
t; > the master heap memory is set by -Xms512m -Xmx512m, while workers set by > -Xms4096M > -Xmx4096M, which I presume sufficient for this trivial query. > > Additionally, after restarted the spark-shell and re-run the limit 5 query > , the df object is returned and can be print

Re: Lost tasks due to OutOfMemoryError (GC overhead limit exceeded)

2016-01-12 Thread Muthu Jayakumar
or.class.getName()); > sparkConf.set("spark.kryo.registrationRequired","true"); > sparkConf.set("spark.kryoserializer.buffer.max.mb","512"); > sparkConf.set("spark.default.parallelism","300"); > sparkConf.set("spark.rpc.askTimeout",&quo

Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-25 Thread Yana Kadiyska
The PermGen space error is controlled with MaxPermSize parameter. I run with this in my pom, I think copied pretty literally from Spark's own tests... I don't know what the sbt equivalent is but you should be able to pass it...possibly via SBT_OPTS? org.scalatest sca

  1   2   >