Missing output partition file in S3

2015-01-22 Thread Nicolas Mai
Hi, My team is using Spark 1.0.1 and the project we're working on needs to compute exact numbers, which are then saved to S3, to be reused later in other Spark jobs to compute other numbers. The problem we noticed yesterday: one of the output partition files in S3 was missing :/ (some

Questions about Spark speculation

2014-09-16 Thread Nicolas Mai
Hi, guys My current project is using Spark 0.9.1, and after increasing the level of parallelism and partitions in our RDDs, stages and tasks seem to complete much faster. However it also seems that our cluster becomes more unstable after some time: - stalled stages still showing under active

Executor address issue: CANNOT FIND ADDRESS (Spark 0.9.1)

2014-09-08 Thread Nicolas Mai
Hi, One of the executors in my spark cluster shows a CANNOT FIND ADDRESS address, for one of the stages which failed. After that stages, I got cascading failures for all my stages :/ (stages that seem complete but still appears as active stage in the dashboard; incomplete or failed stages that are

Getting the number of slaves

2014-07-24 Thread Nicolas Mai
Hi, Is there a way to get the number of slaves/workers during runtime? I searched online but didn't find anything :/ The application I'm working will run on different clusters corresponding to different deployment stages (beta - prod). It would be great to get the number of slaves currently in

Re: Getting the number of slaves

2014-07-24 Thread Nicolas Mai
Thanks, this is what I needed :) I should have searched more... Something I noticed though: after the SparkContext is initialized, I had to wait for a few seconds until sc.getExecutorStorageStatus.length returns the correct number of workers in my cluster (otherwise it returns 1, for the