Re: Problem in Spark Streaming

2014-06-11 Thread vinay Bajaj
http://stackoverflow.com/questions/895444/java-garbage-collection-log-messages http://stackoverflow.com/questions/16794783/how-to-read-a-verbosegc-output I think this will help in understanding the logs. On Wed, Jun 11, 2014 at 12:53 PM, nilmish wrote: > > I used these commands to show the GC

Re: Problem in Spark Streaming

2014-06-11 Thread nilmish
I used these commands to show the GC timings : -verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps Following is the output I got on the standard output : 4.092: [GC 4.092: [ParNew: 274752K->27199K(309056K), 0.0421460 secs] 274752K->27199K(995776K), 0.0422720 secs] [Times: user=0.33 sys=0.11,

Re: Problem in Spark Streaming

2014-06-10 Thread Ashish Rangole
Have you considered the garbage collection impact and if it coincides with your latency spikes? You can enable gc logging by changing Spark configuration for your job. Hi, as I searched the keyword "Total delay" in the console log, the delay keeps increasing. I am not sure what does this "total del

Re: Problem in Spark Streaming

2014-06-10 Thread Yingjun Wu
Hi, as I searched the keyword "Total delay" in the console log, the delay keeps increasing. I am not sure what does this "total delay" mean? For example, if I perform a windowing wordcount with windowSize=1ms and slidingStep=2000ms, then does the delay measured from the 10th second? A sample

Re: Problem in Spark Streaming

2014-06-10 Thread Boduo Li
Oh, I mean the average data rate/node. But in case I want to know the input activities to each node (I use a custom receiver instead of Kafka), I usually search these records in logs to get a sense: "BlockManagerInfo: Added input ... on [hostname:port] (size: xxx KB)" I also see some spikes in la

Re: Problem in Spark Streaming

2014-06-10 Thread nilmish
How can I measure data rate/node ? I am feeding the data through kafka API. I only know the total inflow data rate which almost remains constant . How can I figure out what amount of data is distributed to the nodes in my cluster ? Latency does not keep on increasing infinetly. It goes up for so

Re: Problem in Spark Streaming

2014-06-10 Thread Boduo Li
Hi Nilmish, What's the data rate/node when you see the high latency? (It seems the latency keeps increasing.) Do you still see it if you lower the data rate or the frequency of the windowed query? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-S

Re: Problem in Spark Streaming

2014-06-10 Thread nilmish
You can measure the latency from the logs. Search for words like Total delay in the logs. This denotes the total end to end delay for a particular query. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7312.html Sent from th

Re: Problem in Spark Streaming

2014-06-10 Thread Yingjun Wu
Hi Nilmish, I confront the same problem. I am wondering how do you measure the latency? Regards, Yingjun -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7311.html Sent from the Apache Spark User List mailing list archive a