just for reference in my case this problem is caused by this bug:
https://issues.apache.org/jira/browse/SPARK-12617
On Monday, 21 December 2015, 14:32, Antony Mayi
wrote:
I noticed it might be related to longer GC pauses (1-2 sec) - the crash
usually occurs after such pause. could
fyi after further troubleshooting logging this as
https://issues.apache.org/jira/browse/SPARK-12511
On Tuesday, 22 December 2015, 18:16, Antony Mayi
wrote:
I narrowed it down to problem described for example here:
https://bugs.openjdk.java.net/browse/JDK-6293787
It is the mass
uffer-poolthreadcachememoryregioncacheentry-instances
On Tue, Dec 22, 2015 at 2:59 AM, Antony Mayi
wrote:
I have streaming app (pyspark 1.5.2 on yarn) that's crashing due to driver (jvm
part, not python) OOM (no matter how big heap is assigned, eventually runs out).
When checking the heap it is al
I have streaming app (pyspark 1.5.2 on yarn) that's crashing due to driver (jvm
part, not python) OOM (no matter how big heap is assigned, eventually runs out).
When checking the heap it is all taken by "byte" items of
io.netty.buffer.PoolThreadCache. The number of
io.netty.buffer.PoolThreadCach
I noticed it might be related to longer GC pauses (1-2 sec) - the crash usually
occurs after such pause. could that be causing the python-java gateway timing
out?
On Sunday, 20 December 2015, 23:05, Antony Mayi
wrote:
Hi,
can anyone please help me troubleshooting this prob: I have
Hi,
can anyone please help me troubleshooting this prob: I have a streaming pyspark
application (spark 1.5.2 on yarn-client) which keeps crashing after few hours.
Doesn't seem to be running out of mem neither on driver or executors.
driver error:
py4j.protocol.Py4JJavaError: An error occurred whi
Hi,
using spark 1.5.2 on yarn (client mode) and was trying to use the dynamic
resource allocation but it seems once it is enabled by first app then any
following application is managed that way even if explicitly disabling.
example:1) yarn configured with
org.apache.spark.network.yarn.YarnShuffl
Hi,
I have two streams coming from two different kafka topics. the two topics
contain time related events but are quite asymmetric in volume. I would
obviously need to process them in sync to get the time related events together
but with same processing rate if the heavier stream starts backlogg
Hi,
is it expected I can't reference a column inside of IF statement like this:
sctx.sql("SELECT name, IF(ts>0, price, 0) FROM table").collect()
I get an error:
org.apache.spark.sql.AnalysisException: unresolved operator 'Project [name#0,if
((CAST(ts#1, DoubleType) > CAST(0, DoubleType))) price#2
Hi,
This has already been briefly discussed here in the past but there seems to be
more questions...
I am running bigger ALS task with input data ~40GB (~3 billions of ratings).
The data is partitioned into 512 partitions and I am also using default
parallelism set to 512. The ALS runs with rank
9, 2015 at 5:10 AM Antony Mayi
wrote:
now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC
overhead limit exceeded:
=== spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task
7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC
two RDD vectors representing the decomposed matrix.
You can save these to disk and re use them.
On Thu, Feb 19, 2015 at 2:19 AM Antony Mayi
wrote:
Hi,
when getting the model out of ALS.train it would be beneficial to store it (to
disk) so the model can be reused later for any following
container_1424204221358_0013_01_08 transitioned from RUNNING to
EXITED_WITH_FAILURE2015-02-19 12:08:14,455 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1424204221358_0013_01_08
Antony.
On Thursday, 19 February 2015, 11:54, Antony Mayi
ol rather
than split up your memory, but at some point it becomes
counter-productive. 32GB is a fine executor size.
So you have ~8GB available per task which seems like plenty. Something
else is at work here. Is this error form your code's stages or ALS?
On Thu, Feb 19, 2015 at 10:07 AM, Anton
Hi,
when getting the model out of ALS.train it would be beneficial to store it (to
disk) so the model can be reused later for any following predictions. I am
using pyspark and I had no luck pickling it either using standard pickle module
or even dill.
does anyone have a solution for this (note i
1TB.
It still feels like this shouldn't be running out of memory, not by a
long shot though. But just pointing out potential differences between
what you are expecting and what you are configuring.
On Thu, Feb 19, 2015 at 9:56 AM, Antony Mayi
wrote:
> Hi,
>
> I have 4 very powerful bo
Hi,
I have 4 very powerful boxes (256GB RAM, 32 cores each). I am running spark
1.2.0 in yarn-client mode with following layout:
spark.executor.cores=4
spark.executor.memory=28G
spark.yarn.executor.memoryOverhead=4096
I am submitting bigger ALS trainImplicit task (rank=100, iters=15) on a dataset
files
in the workers that are not needed.
TD
On Mon, Feb 16, 2015 at 12:27 AM, Antony Mayi
wrote:
spark.cleaner.ttl is not the right way - seems to be really designed for
streaming. although it keeps the disk usage under control it also causes loss
of rdds and broadcasts that are required
, Antony Mayi
wrote:
spark.cleaner.ttl ?
On Sunday, 15 February 2015, 18:23, Antony Mayi
wrote:
Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about
3 billions of ratings and I am doing several trainImplicit() runs in loop
within one spark
spark.cleaner.ttl ?
On Sunday, 15 February 2015, 18:23, Antony Mayi
wrote:
Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about
3 billions of ratings and I am doing several trainImplicit() runs in loop
within one spark session. I have four node
Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about
3 billions of ratings and I am doing several trainImplicit() runs in loop
within one spark session. I have four node cluster with 3TB disk space on each.
before starting the job there is less then 8% of the disk
Hi,
is there a way to use custom python module that is available to all executors
under PYTHONPATH (without a need to upload it using sc.addPyFile()) - bit weird
that this module is on all nodes yet the spark tasks can't use it (references
to its objects are serialized and sent to all executors
Hi,
When running big mapreduce operation with pyspark (in the particular case using
lot of sets and operations on sets in the map tasks so likely to be allocating
and freeing loads of pages) I eventually get kernel error 'python: page
allocation failure: order:10, mode:0x2000d0' plus very verbos
org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
On Wednesday, 28 January 2015, 0:01, Guru Medasani
wrote:
Can you attach the logs where this is failing?
From: Sven Krasser
Date: Tuesday, January 27, 2015 at 4:50 PM
To: Guru Medasani
Cc: Sandy Ryza , Antony Mayi ,
"
Hi,
I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors
crashed with this error.
does that mean I have genuinely not enough RAM or is this matter of config
tuning?
other config options used:spark.storage.memoryFraction=0.3
SPARK_EXECUTOR_MEMORY=14G
running spark 1.2.0 as yarn
form across the cluster or for the machine where the
config file resides.)
On Mon Jan 26 2015 at 8:07:51 AM Antony Mayi
wrote:
Hi,
is it possible to mix hosts with (significantly) different specs within a
cluster (without wasting the extra resources)? for example having 10 nodes with
36GB RAM/10CPU
Hi,
is it possible to mix hosts with (significantly) different specs within a
cluster (without wasting the extra resources)? for example having 10 nodes with
36GB RAM/10CPUs now trying to add 3 hosts with 128GB/10CPUs - is there a way to
utilize the extra memory by spark executors (as my underst
her. Again have a look at
http://spark.apache.org/docs/latest/running-on-yarn.html
... --executor-memory 22g --conf
"spark.yarn.executor.memoryOverhead=2g" ... should do it, off the top
of my head. That should reserve 24g from YARN.
On Sat, Jan 17, 2015 at 5:29 AM, Antony Mayi wro
anything more I can do?
thanks,Antony.
On Monday, 12 January 2015, 8:21, Antony Mayi wrote:
this seems to have sorted it, awesome, thanks for great help.Antony.
On Sunday, 11 January 2015, 13:02, Sean Owen wrote:
I would expect the size of the user/item feature RDDs to
Hi,
I believe this is some kind of timeout problem but can't figure out how to
increase it.
I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task
which first loads big RDD from hbase - I can see in the screen output all
executors fire up then no more logging output for ne
Hi,
running spark 1.1.0 in yarn-client mode (cdh 5.2.1) on XEN based cloud and
randomly getting my executors failing on errors like bellow. I suspect it is
some cloud networking issue (XEN driver bug?) but wondering if there is any
spark/yarn workaround that I could use to mitigate?
Thanks,Anton
ou can try
increasing it to a couple GB.
On Sun, Jan 11, 2015 at 9:43 AM, Antony Mayi
wrote:
> the question really is whether this is expected that the memory requirements
> grow rapidly with the rank... as I would expect memory is rather O(1)
> problem with dependency only on the size of inpu
input and parameters?
thanks,Antony.
On Saturday, 10 January 2015, 10:47, Antony Mayi
wrote:
the actual case looks like this:* spark 1.1.0 on yarn (cdh 5.2.1)* ~8-10
executors, 36GB phys RAM per host* input RDD is roughly 3GB containing
~150-200M items (and this RDD is made
container.
thanks for any ideas,Antony.
On Saturday, 10 January 2015, 10:11, Antony Mayi
wrote:
the memory requirements seem to be rapidly growing hen using higher rank... I
am unable to get over 20 without running out of memory. is this
expected?thanks, Antony.
the memory requirements seem to be rapidly growing hen using higher rank... I
am unable to get over 20 without running out of memory. is this
expected?thanks, Antony.
ht be able to give more useful directions about how
to fix that.
On Wed, Jan 7, 2015 at 1:46 PM, Antony Mayi wrote:
this is official cloudera compiled stack cdh 5.3.0 - nothing has been done by
me and I presume they are pretty good in building it so I still suspect it now
gets the classpath res
the cluster. It could
also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x cluster.
On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi
wrote:
Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as
yarn-client) - pretty much the standard case demonstrated i
Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as
yarn-client) - pretty much the standard case demonstrated in the
hbase_inputformat.py from examples... the thing is the when trying the very
same code on spark 1.2 I am getting the error bellow which based on similar
ok, I see now what's happening - the pkg.mod.test is serialized by reference
and there is nothing actually trying to import pkg.mod on the executors so the
reference is broken.
so how can I get the pkg.mod imported on the executors?
thanks,Antony.
On Friday, 2 January 2015, 13:49, A
Hi,
I am running spark 1.1.0 on yarn. I have custom set of modules installed under
same location on each executor node and wondering how can I pass the executors
the PYTHONPATH so that they can use the modules.
I've tried this:
spark-env.sh:export PYTHONPATH=/tmp/test/
spark-defaults.conf:spark.
key1
column=f1:asd, timestamp=1419463092904, value=456
testkey column=f1:testqual,
timestamp=1419487275905, value=testval 2 row(s) in
0.0270 sec
Dec 24, 2014 at 4:11 PM, Antony Mayi wrote:
I just run it by hand from pyspark shell. here is the steps:
pyspark --jars
/usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
>>> conf = {"hbase.zookeeper.quorum": "localhost&quo
rtant in the logs ?
Looks like container launcher was waiting for the script to return some result:
- at
org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715)
- at org.apache.hadoop.util.Shell.runCommand(Shell.java:524)
On Wed, Dec 24, 2014 at 3:1
it ?
Thanks
On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi
wrote:
Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to
1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just
hangs - even when testing with the example from the st
Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to
1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just
hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve?)
using
lls back to toString?
thanks,Antony.
On Monday, 22 December 2014, 20:09, Ted Yu wrote:
Which HBase version are you using ?
Can you show the full stack trace ?
Cheers
On Mon, Dec 22, 2014 at 11:02 AM, Antony Mayi
wrote:
Hi,
can anyone please give me some help how to write custo
Hi,
can anyone please give me some help how to write custom converter of hbase data
to (for example) tuples of ((family, qualifier, value), ) for pyspark:
I was trying something like (here trying to tuples of
("family:qualifier:value", )):
class HBaseResultToTupleConverter extends Converter[A
Hi,
using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel - I
can seen multiple python processes spawned on each nodemanager but from some
reason when running cartesian there is only single python process running on
each node. the task is indicating thousands of partitions so
48 matches
Mail list logo