Given that the process, and in particular, the setup of connections, is bound to the number of partitions (in x.foreachPartition{ x=> ???}), I think it would be worth trying reducing them. Increasing the 'spark.streaming.BlockInterval' will do the trick (you can read the tuning details here: http://www.virdata.com/tuning-spark/#Partitions)
-kr, Gerard. On Thu, Jan 22, 2015 at 4:28 PM, Gerard Maas <gerard.m...@gmail.com> wrote: > So the system has gone from 7msg in 4.961 secs (median) to 106msgs in > 4,761 seconds. > I think there's evidence that setup costs are quite high in this case and > increasing the batch interval is helping. > > On Thu, Jan 22, 2015 at 4:12 PM, Sudipta Banerjee < > asudipta.baner...@gmail.com> wrote: > >> Hi Ashic Mahtab, >> >> The Cassandra and the Zookeeper are they installed as a part of Yarn >> architecture or are they installed in a separate layer with Apache Spark . >> >> Thanks and Regards, >> Sudipta >> >> On Thu, Jan 22, 2015 at 8:13 PM, Ashic Mahtab <as...@live.com> wrote: >> >>> Hi Guys, >>> So I changed the interval to 15 seconds. There's obviously a lot more >>> messages per batch, but (I think) it looks a lot healthier. Can you see any >>> major warning signs? I think that with 2 second intervals, the setup / >>> teardown per partition was what was causing the delays. >>> >>> Streaming >>> >>> - *Started at: *Thu Jan 22 13:23:12 GMT 2015 >>> - *Time since start: *1 hour 17 minutes 16 seconds >>> - *Network receivers: *2 >>> - *Batch interval: *15 seconds >>> - *Processed batches: *309 >>> - *Waiting batches: *0 >>> >>> >>> >>> Statistics over last 100 processed batchesReceiver Statistics >>> >>> - Receiver >>> >>> >>> - Status >>> >>> >>> - Location >>> >>> >>> - Records in last batch >>> - [2015/01/22 14:40:29] >>> >>> >>> - Minimum rate >>> - [records/sec] >>> >>> >>> - Median rate >>> - [records/sec] >>> >>> >>> - Maximum rate >>> - [records/sec] >>> >>> >>> - Last Error >>> >>> RmqReceiver-0ACTIVEVDCAPP53.foo.local2.6 K29106295-RmqReceiver-1ACTIVE >>> VDCAPP50.bar.local2.6 K29107291- >>> Batch Processing Statistics >>> >>> MetricLast batchMinimum25th percentileMedian75th >>> percentileMaximumProcessing >>> Time4 seconds 812 ms4 seconds 698 ms4 seconds 738 ms4 seconds 761 ms4 >>> seconds 788 ms5 seconds 802 msScheduling Delay2 ms0 ms3 ms3 ms4 ms9 >>> msTotal Delay4 seconds 814 ms4 seconds 701 ms4 seconds 739 ms4 >>> seconds 764 ms4 seconds 792 ms5 seconds 809 ms >>> >>> >>> Regards, >>> Ashic. >>> ------------------------------ >>> From: as...@live.com >>> To: gerard.m...@gmail.com >>> CC: user@spark.apache.org >>> Subject: RE: Are these numbers abnormal for spark streaming? >>> Date: Thu, 22 Jan 2015 12:32:05 +0000 >>> >>> >>> Hi Gerard, >>> Thanks for the response. >>> >>> The messages get desrialised from msgpack format, and one of the strings >>> is desrialised to json. Certain fields are checked to decide if further >>> processing is required. If so, it goes through a series of in mem filters >>> to check if more processing is required. If so, only then does the "heavy" >>> work start. That consists of a few db queries, and potential updates to the >>> db + message on message queue. The majority of messages don't need >>> processing. The messages needing processing at peak are about three every >>> other second. >>> >>> One possible things that might be happening is the session >>> initialisation and prepared statement initialisation for each partition. I >>> can resort to some tricks, but I think I'll try increasing batch interval >>> to 15 seconds. I'll report back with findings. >>> >>> Thanks, >>> Ashic. >>> >>> ------------------------------ >>> From: gerard.m...@gmail.com >>> Date: Thu, 22 Jan 2015 12:30:08 +0100 >>> Subject: Re: Are these numbers abnormal for spark streaming? >>> To: tathagata.das1...@gmail.com >>> CC: as...@live.com; t...@databricks.com; user@spark.apache.org >>> >>> and post the code (if possible). >>> In a nutshell, your processing time > batch interval, resulting in an >>> ever-increasing delay that will end up in a crash. >>> 3 secs to process 14 messages looks like a lot. Curious what the job >>> logic is. >>> >>> -kr, Gerard. >>> >>> On Thu, Jan 22, 2015 at 12:15 PM, Tathagata Das < >>> tathagata.das1...@gmail.com> wrote: >>> >>> This is not normal. Its a huge scheduling delay!! Can you tell me more >>> about the application? >>> - cluser setup, number of receivers, whats the computation, etc. >>> >>> On Thu, Jan 22, 2015 at 3:11 AM, Ashic Mahtab <as...@live.com> wrote: >>> >>> Hate to do this...but...erm...bump? Would really appreciate input from >>> others using Streaming. Or at least some docs that would tell me if these >>> are expected or not. >>> >>> ------------------------------ >>> From: as...@live.com >>> To: user@spark.apache.org >>> Subject: Are these numbers abnormal for spark streaming? >>> Date: Wed, 21 Jan 2015 11:26:31 +0000 >>> >>> >>> Hi Guys, >>> I've got Spark Streaming set up for a low data rate system (using >>> spark's features for analysis, rather than high throughput). Messages are >>> coming in throughout the day, at around 1-20 per second (finger in the air >>> estimate...not analysed yet). In the spark streaming UI for the >>> application, I'm getting the following after 17 hours. >>> >>> Streaming >>> >>> - *Started at: *Tue Jan 20 16:58:43 GMT 2015 >>> - *Time since start: *18 hours 24 minutes 34 seconds >>> - *Network receivers: *2 >>> - *Batch interval: *2 seconds >>> - *Processed batches: *16482 >>> - *Waiting batches: *1 >>> >>> >>> >>> Statistics over last 100 processed batchesReceiver Statistics >>> >>> - Receiver >>> >>> >>> - Status >>> >>> >>> - Location >>> >>> >>> - Records in last batch >>> - [2015/01/21 11:23:18] >>> >>> >>> - Minimum rate >>> - [records/sec] >>> >>> >>> - Median rate >>> - [records/sec] >>> >>> >>> - Maximum rate >>> - [records/sec] >>> >>> >>> - Last Error >>> >>> RmqReceiver-0ACTIVEFOOOO >>> 144727-RmqReceiver-1ACTIVEBAAAAR >>> 124726- >>> Batch Processing Statistics >>> >>> MetricLast batchMinimum25th percentileMedian75th >>> percentileMaximumProcessing >>> Time3 seconds 994 ms157 ms4 seconds 16 ms4 seconds 961 ms5 seconds 3 >>> ms5 seconds 171 msScheduling Delay9 hours 15 minutes 4 seconds9 >>> hours 10 minutes 54 seconds9 hours 11 minutes 56 seconds9 hours 12 >>> minutes 57 seconds9 hours 14 minutes 5 seconds9 hours 15 minutes 4 >>> secondsTotal Delay9 hours 15 minutes 8 seconds9 hours 10 minutes 58 >>> seconds9 hours 12 minutes9 hours 13 minutes 2 seconds9 hours 14 >>> minutes 10 seconds9 hours 15 minutes 8 seconds >>> >>> >>> Are these "normal". I was wondering what the scheduling delay and total >>> delay terms are, and if it's normal for them to be 9 hours. >>> >>> I've got a standalone spark master and 4 spark nodes. The streaming app >>> has been given 4 cores, and it's using 1 core per worker node. The >>> streaming app is submitted from a 5th machine, and that machine has nothing >>> but the driver running. The worker nodes are running alongside Cassandra >>> (and reading and writing to it). >>> >>> Any insights would be appreciated. >>> >>> Regards, >>> Ashic. >>> >>> >>> >>> >> >> >> -- >> Sudipta Banerjee >> Consultant, Business Analytics and Cloud Based Architecture >> Call me +919019578099 >> > >