Hi, I change my process flow.
Now I am processing a file per hour, instead of process at the end of the
day.
This decreased the memory comsuption .
Regards
Eduardo
On Thu, Mar 26, 2015 at 3:16 PM, Davies Liu wrote:
> Could you narrow down to a step which cause the OOM, something like:
Could you narrow down to a step which cause the OOM, something like:
log2= self.sqlContext.jsonFile(path)
log2.count()
...
out.count()
...
On Thu, Mar 26, 2015 at 10:34 AM, Eduardo Cusa
wrote:
> the last try was without log2.cache() and still getting out of memory
>
> I using the following conf,
the last try was without log2.cache() and still getting out of memory
I using the following conf, maybe help:
conf = (SparkConf()
.setAppName("LoadS3")
.set("spark.executor.memory", "13g")
.set("spark.driver.memory", "13g")
.set("spark.driver.maxResultS
Could you try to remove the line `log2.cache()` ?
On Thu, Mar 26, 2015 at 10:02 AM, Eduardo Cusa
wrote:
> I running on ec2 :
>
> 1 Master : 4 CPU 15 GB RAM (2 GB swap)
>
> 2 Slaves 4 CPU 15 GB RAM
>
>
> the uncompressed dataset size is 15 GB
>
>
>
>
> On Thu, Mar 26, 2015 at 10:41 AM, Eduardo C
I running on ec2 :
1 Master : 4 CPU 15 GB RAM (2 GB swap)
2 Slaves 4 CPU 15 GB RAM
the uncompressed dataset size is 15 GB
On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa <
eduardo.c...@usmediaconsulting.com> wrote:
> Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory.
>
> I ran
Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory.
I ran the same code as before, I need to make any changes?
On Wed, Mar 25, 2015 at 4:00 PM, Davies Liu wrote:
> With batchSize = 1, I think it will become even worse.
>
> I'd suggest to go with 1.3, have a taste for the new Dat
With batchSize = 1, I think it will become even worse.
I'd suggest to go with 1.3, have a taste for the new DataFrame API.
On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa
wrote:
> Hi Davies, I running 1.1.0.
>
> Now I'm following this thread that recommend use batchsize parameter = 1
>
>
> http:/
Hi Davies, I running 1.1.0.
Now I'm following this thread that recommend use batchsize parameter = 1
http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html
if this does not work I will install 1.2.1 or 1.3
Regards
On Wed, Mar 25, 2015 at 3:39 PM, Davies Li
What's the version of Spark you are running?
There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3,
[1] https://issues.apache.org/jira/browse/SPARK-6055
On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa
wrote:
> Hi Guys, I running the following function with spark-submmit and de SO is
Hi Guys, I running the following function with spark-submmit and de SO is
killing my process :
def getRdd(self,date,provider):
path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz'
log2= self.sqlContext.jsonFile(path)
log2.registerTempTable('log_test')
log2.cache()
out=self.sqlConte
10 matches
Mail list logo