Re: View executor logs on YARN mode

2015-01-14 Thread Brett Meyer
You can view the logs for the particular containers on the YARN UI if you go to the page for a specific node, and then from the Tools menu on the left, select Local Logs. There should be a userlogs directory which will contain the specific application ids for each job that you run. Inside the

Re: Spark executors resources. Blocking?

2015-01-14 Thread Brett Meyer
I can¹t speak to Mesos solutions, but for YARN you can define queues in which to run your jobs, and you can customize the amount of resources the queue consumes. When deploying your Spark job, you can specify the ‹queue queue_name option to schedule the job to a particular queue. Here are some

Location of logs in local mode

2015-01-06 Thread Brett Meyer
I¹m submitting a script using spark-submit in local mode for testing, and I¹m having trouble figuring out where the logs are stored. The documentation indicates that they should be in the work folder in the directory in which Spark lives on my system, but I see no such folder there. I¹ve set the

Location of logs in local mode

2014-12-30 Thread Brett Meyer
I¹m submitting a script using spark-submit in local mode for testing, and I¹m having trouble figuring out where the logs are stored. The documentation indicates that they should be in the work folder in the directory in which Spark lives on my system, but I see no such folder there. I¹ve set the

How to pass options to KeyConverter using PySpark

2014-12-29 Thread Brett Meyer
I¹m running PySpark on YARN, and I¹m reading in SequenceFiles for which I have a custom KeyConverter class. My KeyConverter needs to have some configuration options passed to it, but I am unable to find a way to get the options to that class without modifying the Spark source. Is there a

Many retries for Python job

2014-11-21 Thread Brett Meyer
I¹m running a Python script with spark-submit on top of YARN on an EMR cluster with 30 nodes. The script reads in approximately 3.9 TB of data from S3, and then does some transformations and filtering, followed by some aggregate counts. During Stage 2 of the job, everything looks to complete

Re: Many retries for Python job

2014-11-21 Thread Brett Meyer
seem to be missing in many cases and result in FetchFailure errors. I should probably also mention that I have the spark.storage.memoryFraction set to 0.2. From: Sandy Ryza sandy.r...@cloudera.com Date: Friday, November 21, 2014 at 1:41 PM To: Brett Meyer brett.me...@crowdstrike.com Cc: user

Failed jobs showing as SUCCEEDED on web UI

2014-11-11 Thread Brett Meyer
I¹m running a Python script using spark-submit on YARN in an EMR cluster, and if I have a job that fails due to ExecutorLostFailure or if I kill the job, it still shows up on the web UI with a FinalStatus of SUCCEEDED. Is this due to PySpark, or is there potentially some other issue with the job