from:"Antony Mayi"

Re: pyspark streaming crashes

2016-01-04 Thread Antony Mayi

just for reference in my case this problem is caused by this bug: https://issues.apache.org/jira/browse/SPARK-12617 On Monday, 21 December 2015, 14:32, Antony Mayi wrote: I noticed it might be related to longer GC pauses (1-2 sec) - the crash usually occurs after such pause. could

Re: driver OOM due to io.netty.buffer items not getting finalized

2015-12-23 Thread Antony Mayi

fyi after further troubleshooting logging this as https://issues.apache.org/jira/browse/SPARK-12511 On Tuesday, 22 December 2015, 18:16, Antony Mayi wrote: I narrowed it down to problem described for example here: https://bugs.openjdk.java.net/browse/JDK-6293787 It is the mass

Re: driver OOM due to io.netty.buffer items not getting finalized

2015-12-22 Thread Antony Mayi

uffer-poolthreadcachememoryregioncacheentry-instances On Tue, Dec 22, 2015 at 2:59 AM, Antony Mayi wrote: I have streaming app (pyspark 1.5.2 on yarn) that's crashing due to driver (jvm part, not python) OOM (no matter how big heap is assigned, eventually runs out). When checking the heap it is al

driver OOM due to io.netty.buffer items not getting finalized

2015-12-22 Thread Antony Mayi

I have streaming app (pyspark 1.5.2 on yarn) that's crashing due to driver (jvm part, not python) OOM (no matter how big heap is assigned, eventually runs out). When checking the heap it is all taken by "byte" items of io.netty.buffer.PoolThreadCache. The number of io.netty.buffer.PoolThreadCach

Re: pyspark streaming crashes

2015-12-21 Thread Antony Mayi

I noticed it might be related to longer GC pauses (1-2 sec) - the crash usually occurs after such pause. could that be causing the python-java gateway timing out? On Sunday, 20 December 2015, 23:05, Antony Mayi wrote: Hi, can anyone please help me troubleshooting this prob: I have

pyspark streaming crashes

2015-12-20 Thread Antony Mayi

Hi, can anyone please help me troubleshooting this prob: I have a streaming pyspark application (spark 1.5.2 on yarn-client) which keeps crashing after few hours. Doesn't seem to be running out of mem neither on driver or executors. driver error: py4j.protocol.Py4JJavaError: An error occurred whi

imposed dynamic resource allocation

2015-12-11 Thread Antony Mayi

Hi, using spark 1.5.2 on yarn (client mode) and was trying to use the dynamic resource allocation but it seems once it is enabled by first app then any following application is managed that way even if explicitly disabling. example:1) yarn configured with org.apache.spark.network.yarn.YarnShuffl

synchronizing streams of different kafka topics

2015-11-17 Thread Antony Mayi

Hi, I have two streams coming from two different kafka topics. the two topics contain time related events but are quite asymmetric in volume. I would obviously need to process them in sync to get the time related events together but with same processing rate if the heavier stream starts backlogg

IF in SQL statement

2015-05-16 Thread Antony Mayi

Hi, is it expected I can't reference a column inside of IF statement like this: sctx.sql("SELECT name, IF(ts>0, price, 0) FROM table").collect() I get an error: org.apache.spark.sql.AnalysisException: unresolved operator 'Project [name#0,if ((CAST(ts#1, DoubleType) > CAST(0, DoubleType))) price#2

shuffle data taking immense disk space during ALS

2015-02-23 Thread Antony Mayi

Hi, This has already been briefly discussed here in the past but there seems to be more questions... I am running bigger ALS task with input data ~40GB (~3 billions of ratings). The data is partitioned into 512 partitions and I am also using default parallelism set to 512. The ALS runs with rank

Re: loads of memory still GC overhead limit exceeded

2015-02-20 Thread Antony Mayi

9, 2015 at 5:10 AM Antony Mayi wrote: now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC overhead limit exceeded: === spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC

Re: storing MatrixFactorizationModel (pyspark)

2015-02-20 Thread Antony Mayi

two RDD vectors representing the decomposed matrix. You can save these to disk and re use them. On Thu, Feb 19, 2015 at 2:19 AM Antony Mayi wrote: Hi, when getting the model out of ALS.train it would be beneficial to store it (to disk) so the model can be reused later for any following

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi

container_1424204221358_0013_01_08 transitioned from RUNNING to EXITED_WITH_FAILURE2015-02-19 12:08:14,455 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1424204221358_0013_01_08 Antony. On Thursday, 19 February 2015, 11:54, Antony Mayi

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi

ol rather than split up your memory, but at some point it becomes counter-productive. 32GB is a fine executor size. So you have ~8GB available per task which seems like plenty. Something else is at work here. Is this error form your code's stages or ALS? On Thu, Feb 19, 2015 at 10:07 AM, Anton

storing MatrixFactorizationModel (pyspark)

2015-02-19 Thread Antony Mayi

Hi, when getting the model out of ALS.train it would be beneficial to store it (to disk) so the model can be reused later for any following predictions. I am using pyspark and I had no luck pickling it either using standard pickle module or even dill. does anyone have a solution for this (note i

Re: loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi

1TB. It still feels like this shouldn't be running out of memory, not by a long shot though. But just pointing out potential differences between what you are expecting and what you are configuring. On Thu, Feb 19, 2015 at 9:56 AM, Antony Mayi wrote: > Hi, > > I have 4 very powerful bo

loads of memory still GC overhead limit exceeded

2015-02-19 Thread Antony Mayi

Hi, I have 4 very powerful boxes (256GB RAM, 32 cores each). I am running spark 1.2.0 in yarn-client mode with following layout: spark.executor.cores=4 spark.executor.memory=28G spark.yarn.executor.memoryOverhead=4096 I am submitting bigger ALS trainImplicit task (rank=100, iters=15) on a dataset

Re: spark-local dir running out of space during long ALS run

2015-02-16 Thread Antony Mayi

files in the workers that are not needed. TD On Mon, Feb 16, 2015 at 12:27 AM, Antony Mayi wrote: spark.cleaner.ttl is not the right way - seems to be really designed for streaming. although it keeps the disk usage under control it also causes loss of rdds and broadcasts that are required

Re: spark-local dir running out of space during long ALS run

2015-02-16 Thread Antony Mayi

, Antony Mayi wrote: spark.cleaner.ttl ? On Sunday, 15 February 2015, 18:23, Antony Mayi wrote: Hi, I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about 3 billions of ratings and I am doing several trainImplicit() runs in loop within one spark

Re: spark-local dir running out of space during long ALS run

2015-02-15 Thread Antony Mayi

spark.cleaner.ttl ? On Sunday, 15 February 2015, 18:23, Antony Mayi wrote: Hi, I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about 3 billions of ratings and I am doing several trainImplicit() runs in loop within one spark session. I have four node

spark-local dir running out of space during long ALS run

2015-02-15 Thread Antony Mayi

Hi, I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about 3 billions of ratings and I am doing several trainImplicit() runs in loop within one spark session. I have four node cluster with 3TB disk space on each. before starting the job there is less then 8% of the disk

pyspark importing custom module

2015-02-06 Thread Antony Mayi

Hi, is there a way to use custom python module that is available to all executors under PYTHONPATH (without a need to upload it using sc.addPyFile()) - bit weird that this module is on all nodes yet the spark tasks can't use it (references to its objects are serialized and sent to all executors

pyspark and and page allocation failures due to memory fragmentation

2015-01-30 Thread Antony Mayi

Hi, When running big mapreduce operation with pyspark (in the particular case using lot of sets and operations on sets in the map tasks so likely to be allocating and freeing loads of pages) I eventually get kernel error 'python: page allocation failure: order:10, mode:0x2000d0' plus very verbos

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi

org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) On Wednesday, 28 January 2015, 0:01, Guru Medasani wrote: Can you attach the logs where this is failing? From: Sven Krasser Date: Tuesday, January 27, 2015 at 4:50 PM To: Guru Medasani Cc: Sandy Ryza , Antony Mayi , "

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi

Hi, I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors crashed with this error. does that mean I have genuinely not enough RAM or is this matter of config tuning? other config options used:spark.storage.memoryFraction=0.3 SPARK_EXECUTOR_MEMORY=14G running spark 1.2.0 as yarn

Re: HW imbalance

2015-01-26 Thread Antony Mayi

form across the cluster or for the machine where the config file resides.) On Mon Jan 26 2015 at 8:07:51 AM Antony Mayi wrote: Hi, is it possible to mix hosts with (significantly) different specs within a cluster (without wasting the extra resources)? for example having 10 nodes with 36GB RAM/10CPU

HW imbalance

2015-01-26 Thread Antony Mayi

Hi, is it possible to mix hosts with (significantly) different specs within a cluster (without wasting the extra resources)? for example having 10 nodes with 36GB RAM/10CPUs now trying to add 3 hosts with 128GB/10CPUs - is there a way to utilize the extra memory by spark executors (as my underst

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-17 Thread Antony Mayi

her. Again have a look at http://spark.apache.org/docs/latest/running-on-yarn.html ... --executor-memory 22g --conf "spark.yarn.executor.memoryOverhead=2g" ... should do it, off the top of my head. That should reserve 24g from YARN. On Sat, Jan 17, 2015 at 5:29 AM, Antony Mayi wro

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-16 Thread Antony Mayi

anything more I can do? thanks,Antony. On Monday, 12 January 2015, 8:21, Antony Mayi wrote: this seems to have sorted it, awesome, thanks for great help.Antony. On Sunday, 11 January 2015, 13:02, Sean Owen wrote: I would expect the size of the user/item feature RDDs to

remote Akka client disassociated - some timeout?

2015-01-16 Thread Antony Mayi

Hi, I believe this is some kind of timeout problem but can't figure out how to increase it. I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task which first loads big RDD from hbase - I can see in the screen output all executors fire up then no more logging output for ne

java.io.IOException: sendMessageReliably failed without being ACK'd

2015-01-14 Thread Antony Mayi

Hi, running spark 1.1.0 in yarn-client mode (cdh 5.2.1) on XEN based cloud and randomly getting my executors failing on errors like bellow. I suspect it is some cloud networking issue (XEN driver bug?) but wondering if there is any spark/yarn workaround that I could use to mitigate? Thanks,Anton

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-11 Thread Antony Mayi

ou can try increasing it to a couple GB. On Sun, Jan 11, 2015 at 9:43 AM, Antony Mayi wrote: > the question really is whether this is expected that the memory requirements > grow rapidly with the rank... as I would expect memory is rather O(1) > problem with dependency only on the size of inpu

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-11 Thread Antony Mayi

input and parameters? thanks,Antony. On Saturday, 10 January 2015, 10:47, Antony Mayi wrote: the actual case looks like this:* spark 1.1.0 on yarn (cdh 5.2.1)* ~8-10 executors, 36GB phys RAM per host* input RDD is roughly 3GB containing ~150-200M items (and this RDD is made

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-10 Thread Antony Mayi

container. thanks for any ideas,Antony. On Saturday, 10 January 2015, 10:11, Antony Mayi wrote: the memory requirements seem to be rapidly growing hen using higher rank... I am unable to get over 20 without running out of memory. is this expected?thanks, Antony.

ALS.trainImplicit running out of mem when using higher rank

2015-01-10 Thread Antony Mayi

the memory requirements seem to be rapidly growing hen using higher rank... I am unable to get over 20 without running out of memory. is this expected?thanks, Antony.

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

2015-01-07 Thread Antony Mayi

ht be able to give more useful directions about how to fix that. On Wed, Jan 7, 2015 at 1:46 PM, Antony Mayi wrote: this is official cloudera compiled stack cdh 5.3.0 - nothing has been done by me and I presume they are pretty good in building it so I still suspect it now gets the classpath res

Re: spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

2015-01-07 Thread Antony Mayi

the cluster. It could also be that you're pairing Spark compiled for Hadoop 1.x with a 2.x cluster. On Wed, Jan 7, 2015 at 9:38 AM, Antony Mayi wrote: Hi, I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as yarn-client) - pretty much the standard case demonstrated i

spark 1.2 defaults to MR1 class when calling newAPIHadoopRDD

2015-01-07 Thread Antony Mayi

Hi, I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as yarn-client) - pretty much the standard case demonstrated in the hbase_inputformat.py from examples... the thing is the when trying the very same code on spark 1.2 I am getting the error bellow which based on similar

Re: pyspark executor PYTHONPATH

2015-01-02 Thread Antony Mayi

ok, I see now what's happening - the pkg.mod.test is serialized by reference and there is nothing actually trying to import pkg.mod on the executors so the reference is broken. so how can I get the pkg.mod imported on the executors? thanks,Antony. On Friday, 2 January 2015, 13:49, A

pyspark executor PYTHONPATH

2015-01-02 Thread Antony Mayi

Hi, I am running spark 1.1.0 on yarn. I have custom set of modules installed under same location on each executor node and wondering how can I pass the executors the PYTHONPATH so that they can use the modules. I've tried this: spark-env.sh:export PYTHONPATH=/tmp/test/ spark-defaults.conf:spark.

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi

key1 column=f1:asd, timestamp=1419463092904, value=456 testkey column=f1:testqual, timestamp=1419487275905, value=testval 2 row(s) in 0.0270 sec

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi

Dec 24, 2014 at 4:11 PM, Antony Mayi wrote: I just run it by hand from pyspark shell. here is the steps: pyspark --jars /usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar >>> conf = {"hbase.zookeeper.quorum": "localhost&quo

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi

rtant in the logs ? Looks like container launcher was waiting for the script to return some result: - at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715) - at org.apache.hadoop.util.Shell.runCommand(Shell.java:524) On Wed, Dec 24, 2014 at 3:1

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi

it ? Thanks On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi wrote: Hi, have been using this without any issues with spark 1.1.0 but after upgrading to 1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing with the example from the st

saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi

Hi, have been using this without any issues with spark 1.1.0 but after upgrading to 1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing with the example from the stock hbase_outputformat.py. anyone having same issue? (and able to solve?) using

Re: custom python converter from HBase Result to tuple

2014-12-22 Thread Antony Mayi

lls back to toString? thanks,Antony. On Monday, 22 December 2014, 20:09, Ted Yu wrote: Which HBase version are you using ? Can you show the full stack trace ? Cheers On Mon, Dec 22, 2014 at 11:02 AM, Antony Mayi wrote: Hi, can anyone please give me some help how to write custo

custom python converter from HBase Result to tuple

2014-12-22 Thread Antony Mayi

Hi, can anyone please give me some help how to write custom converter of hbase data to (for example) tuples of ((family, qualifier, value), ) for pyspark: I was trying something like (here trying to tuples of ("family:qualifier:value", )): class HBaseResultToTupleConverter extends Converter[A

cartesian on pyspark not paralleised

2014-12-05 Thread Antony Mayi

Hi, using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel - I can seen multiple python processes spawned on each nodemanager but from some reason when running cartesian there is only single python process running on each node. the task is indicating thousands of partitions so

48 matches

Mail list logo