Re: python : Out of memory: Kill process

2015-03-30 Thread Eduardo Cusa
Hi, I change my process flow. Now I am processing a file per hour, instead of process at the end of the day. This decreased the memory comsuption . Regards Eduardo On Thu, Mar 26, 2015 at 3:16 PM, Davies Liu wrote: > Could you narrow down to a step which cause the OOM, something like:

Re: python : Out of memory: Kill process

2015-03-26 Thread Davies Liu
Could you narrow down to a step which cause the OOM, something like: log2= self.sqlContext.jsonFile(path) log2.count() ... out.count() ... On Thu, Mar 26, 2015 at 10:34 AM, Eduardo Cusa wrote: > the last try was without log2.cache() and still getting out of memory > > I using the following conf,

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
the last try was without log2.cache() and still getting out of memory I using the following conf, maybe help: conf = (SparkConf() .setAppName("LoadS3") .set("spark.executor.memory", "13g") .set("spark.driver.memory", "13g") .set("spark.driver.maxResultS

Re: python : Out of memory: Kill process

2015-03-26 Thread Davies Liu
Could you try to remove the line `log2.cache()` ? On Thu, Mar 26, 2015 at 10:02 AM, Eduardo Cusa wrote: > I running on ec2 : > > 1 Master : 4 CPU 15 GB RAM (2 GB swap) > > 2 Slaves 4 CPU 15 GB RAM > > > the uncompressed dataset size is 15 GB > > > > > On Thu, Mar 26, 2015 at 10:41 AM, Eduardo C

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
I running on ec2 : 1 Master : 4 CPU 15 GB RAM (2 GB swap) 2 Slaves 4 CPU 15 GB RAM the uncompressed dataset size is 15 GB On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa < eduardo.c...@usmediaconsulting.com> wrote: > Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory. > > I ran

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory. I ran the same code as before, I need to make any changes? On Wed, Mar 25, 2015 at 4:00 PM, Davies Liu wrote: > With batchSize = 1, I think it will become even worse. > > I'd suggest to go with 1.3, have a taste for the new Dat

Re: python : Out of memory: Kill process

2015-03-25 Thread Davies Liu
With batchSize = 1, I think it will become even worse. I'd suggest to go with 1.3, have a taste for the new DataFrame API. On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa wrote: > Hi Davies, I running 1.1.0. > > Now I'm following this thread that recommend use batchsize parameter = 1 > > > http:/

Re: python : Out of memory: Kill process

2015-03-25 Thread Eduardo Cusa
Hi Davies, I running 1.1.0. Now I'm following this thread that recommend use batchsize parameter = 1 http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html if this does not work I will install 1.2.1 or 1.3 Regards On Wed, Mar 25, 2015 at 3:39 PM, Davies Li

Re: python : Out of memory: Kill process

2015-03-25 Thread Davies Liu
What's the version of Spark you are running? There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3, [1] https://issues.apache.org/jira/browse/SPARK-6055 On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa wrote: > Hi Guys, I running the following function with spark-submmit and de SO is

python : Out of memory: Kill process

2015-03-25 Thread Eduardo Cusa
Hi Guys, I running the following function with spark-submmit and de SO is killing my process : def getRdd(self,date,provider): path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz' log2= self.sqlContext.jsonFile(path) log2.registerTempTable('log_test') log2.cache() out=self.sqlConte