I actually tried the File Source (reading CSV files as stream and processing them). File source seems to be generating valid numbers in the metrics log files. I may be wrong but seems like an issue with the Rate source generating metrics in the metrics log files. -------------------------------------------------- Dhruv Kumar PhD Candidate Department of Computer Science and Engineering University of Minnesota www.dhruvkumar.me
> On Jun 22, 2018, at 00:35, Dhruv Kumar <gargdhru...@gmail.com> wrote: > > Thanks a lot for your mail Jungtaek. I added the StreamingQueryListener into > my code (updated code > <https://gist.github.com/kudhru/e1ce6b3f399c546be5eeb1f590087992>) and was > able to see valid inputRowsPerSecond, processRowsPerSecond numbers. But it > also shows zeros intermittently. Here is the sample output > <https://gist.github.com/kudhru/db2bced789c528464620ae1767597127> Could you > explain why is this the case? > Unfortunately, the csv files still show zeros only except few non-zeros. Do > you know why this may be happening? (I changed the metrics.properties to > print every second instead of every 10 seconds). > Here is the output of the metrics log file > (run_latest.driver.spark.streaming.aggregates.inputRate-total.csv) > t,value > 1529645042,0.0 > 1529645043,0.0 > 1529645044,0.0 > 1529645045,NaN > 1529645046,88967.97153024911 > 1529645047,100200.4008016032 > 1529645048,122100.12210012211 > 1529645049,0.0 > 1529645050,0.0 > 1529645051,0.0 > 1529645052,0.0 > 1529645053,0.0 > 1529645054,0.0 > 1529645055,0.0 > 1529645056,0.0 > 1529645057,0.0 > 1529645058,0.0 > 1529645059,0.0 > 1529645060,0.0 > 1529645061,0.0 > 1529645062,0.0 > 1529645063,0.0 > 1529645064,0.0 > 1529645065,0.0 > 1529645066,0.0 > 1529645067,0.0 > 1529645068,0.0 > 1529645069,0.0 > 1529645070,0.0 > 1529645071,0.0 > 1529645072,93808.63039399624 > 1529645073,0.0 > 1529645074,0.0 > 1529645075,0.0 > 1529645076,0.0 > 1529645077,0.0 > 1529645078,0.0 > > > > -------------------------------------------------- > Dhruv Kumar > PhD Candidate > Department of Computer Science and Engineering > University of Minnesota > www.dhruvkumar.me <http://www.dhruvkumar.me/> > >> On Jun 21, 2018, at 23:07, Jungtaek Lim <kabh...@gmail.com >> <mailto:kabh...@gmail.com>> wrote: >> >> I'm referring to 2.4.0-SNAPSHOT (not sure which commit I'm referring) but it >> properly returns the input rate. >> >> $ tail -F >> /tmp/spark-trial-metric/local-1529640063554.driver.spark.streaming.counts.inputRate-total.csv >> t,value >> 1529640073,0.0 >> 1529640083,0.9411272613196695 >> 1529640093,0.9430996541967934 >> 1529640103,1.0606060606060606 >> 1529640113,0.9997000899730081 >> >> Could you add streaming query listener and see the value of sources -> >> numInputRows, inputRowsPerSecond, processedRowsPerSecond? They should >> provide some valid numbers. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> 2018년 6월 22일 (금) 오전 11:49, Dhruv Kumar <gargdhru...@gmail.com >> <mailto:gargdhru...@gmail.com>>님이 작성: >> Hi >> >> I was trying to measure the performance metrics for spark structured >> streaming. But I am unable to see any data in the metrics log files. My >> input source is the Rate source >> <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#creating-streaming-dataframes-and-streaming-datasets> >> which generates data at the specified number of rows per second. Here is >> the link to my code >> <https://gist.github.com/kudhru/e1ce6b3f399c546be5eeb1f590087992> and >> metrics.properties >> <https://gist.github.com/kudhru/5d8a8f4d53c766e9efad4de2ae9b82d6> file. >> >> When I run the above mentioned code using spark-submit, I see that the >> metrics logs (for example, >> run_1.driver.spark.streaming.aggregates.inputRate-total.csv) are created >> under the specified directory but most of the values are 0. >> Below is a portion of the inputeRate-total.csv file: >> 1529634585,0.0 >> 1529634595,0.0 >> 1529634605,0.0 >> 1529634615,0.0 >> 1529634625,0.0 >> 1529634635,0.0 >> 1529634645,0.0 >> 1529634655,0.0 >> 1529634665,0.0 >> 1529634675,0.0 >> 1529634685,0.0 >> 1529634695,0.0 >> 1529634705,0.0 >> 1529634715,0.0 >> >> Any reason as to why this must be happening? Happy to share more information >> if that helps. >> >> Thanks >> -------------------------------------------------- >> Dhruv Kumar >> PhD Candidate >> Department of Computer Science and Engineering >> University of Minnesota >> www.dhruvkumar.me <http://www.dhruvkumar.me/> >