control the number of reducers for groupby in data frame

2015-08-04 Thread Fang, Mike
Hi, Does anyone know how I could control the number of reducer when we do operation such as groupie For data frame? I could set spark.sql.shuffle.partitions in sql but not sure how to do in df.groupBy("XX") api. Thanks, Mike

Re: understanding on the "waiting batches" and "scheduling delay" in Streaming UI

2015-06-22 Thread Fang, Mike
Hi Das, Thanks for your reply. Somehow I missed it.. I am using Spark 1.3. The data source is from kafka. Yeah, not sure why the delay is 0. I'll run against 1.4 and give a screenshot. Thanks, Mike From: Akhil Das mailto:ak...@sigmoidanalytics.com>> Date: Thursday, June 18, 2015 at 6:05 PM To: M

questions on the "waiting batches" and "scheduling delay" in Streaming UI

2015-06-16 Thread Fang, Mike
Hi, I have a spark streaming program running for ~ 25hrs. When I check the Streaming UI tab. I found the "Waiting batches" is 144. But the "scheduling delay" is 0. I am a bit confused. If the "waiting batches" is 144, that means many batches are waiting in the queue to be processed? If this is