Re: Spark and disk usage.

2014-09-21 Thread Andrew Ash
From: "Andrew Ash" > To: "Burak Yavuz" > Cc: "Макар Красноперов" , "user" < > user@spark.apache.org> > Sent: Wednesday, September 17, 2014 11:04:02 AM > Subject: Re: Spark and disk usage. > > Thanks for the info! > > Ar

Re: Spark and disk usage.

2014-09-17 Thread Burak Yavuz
age, except in Spark Streaming, and some MLlib algorithms. If you can help with the guide, I think it would be a nice feature to have! Burak - Original Message - From: "Andrew Ash" To: "Burak Yavuz" Cc: "Макар Красноперов" , "user" Sent: Wednesday

Re: Spark and disk usage.

2014-09-17 Thread Andrew Ash
Thanks for the info! Are there performance impacts with writing to HDFS instead of local disk? I'm assuming that's why ALS checkpoints every third iteration instead of every iteration. Also I can imagine that checkpointing should be done every N shuffles instead of every N operations (counting m

Re: Spark and disk usage.

2014-09-17 Thread Burak Yavuz
etting the directory will not be enough. Best, Burak - Original Message - From: "Andrew Ash" To: "Burak Yavuz" Cc: "Макар Красноперов" , "user" Sent: Wednesday, September 17, 2014 10:19:42 AM Subject: Re: Spark and disk usage. Hi Burak, Most discussion

Re: Spark and disk usage.

2014-09-17 Thread Andrew Ash
ark writing to disk, you can specify a checkpoint > directory in > HDFS, where Spark will write the current status instead and will clean up > files from disk. > > Best, > Burak > > - Original Message - > From: "Макар Красноперов" > To: user@spark.ap

Re: Spark and disk usage.

2014-09-17 Thread Burak Yavuz
will write the current status instead and will clean up files from disk. Best, Burak - Original Message - From: "Макар Красноперов" To: user@spark.apache.org Sent: Wednesday, September 17, 2014 7:37:49 AM Subject: Spark and disk usage. Hello everyone. The problem is that spa

Spark and disk usage.

2014-09-17 Thread Макар Красноперов
Hello everyone. The problem is that spark write data to the disk very hard, even if application has a lot of free memory (about 3.8g). So, I've noticed that folder with name like "spark-local-20140917165839-f58c" contains a lot of other folders with files like "shuffle_446_0_1". The total size of