From: "Andrew Ash"
> To: "Burak Yavuz"
> Cc: "Макар Красноперов" , "user" <
> user@spark.apache.org>
> Sent: Wednesday, September 17, 2014 11:04:02 AM
> Subject: Re: Spark and disk usage.
>
> Thanks for the info!
>
> Ar
age, except in Spark Streaming, and some MLlib algorithms.
If you can help with the guide, I think it would be a nice feature to have!
Burak
- Original Message -
From: "Andrew Ash"
To: "Burak Yavuz"
Cc: "Макар Красноперов" , "user"
Sent: Wednesday
Thanks for the info!
Are there performance impacts with writing to HDFS instead of local disk?
I'm assuming that's why ALS checkpoints every third iteration instead of
every iteration.
Also I can imagine that checkpointing should be done every N shuffles
instead of every N operations (counting m
etting the directory will not be enough.
Best,
Burak
- Original Message -
From: "Andrew Ash"
To: "Burak Yavuz"
Cc: "Макар Красноперов" , "user"
Sent: Wednesday, September 17, 2014 10:19:42 AM
Subject: Re: Spark and disk usage.
Hi Burak,
Most discussion
ark writing to disk, you can specify a checkpoint
> directory in
> HDFS, where Spark will write the current status instead and will clean up
> files from disk.
>
> Best,
> Burak
>
> - Original Message -
> From: "Макар Красноперов"
> To: user@spark.ap
will write the current status instead and will clean up files
from disk.
Best,
Burak
- Original Message -
From: "Макар Красноперов"
To: user@spark.apache.org
Sent: Wednesday, September 17, 2014 7:37:49 AM
Subject: Spark and disk usage.
Hello everyone.
The problem is that spa
Hello everyone.
The problem is that spark write data to the disk very hard, even if
application has a lot of free memory (about 3.8g).
So, I've noticed that folder with name like
"spark-local-20140917165839-f58c" contains a lot of other folders with
files like "shuffle_446_0_1". The total size of