I have a requirement in which I plan to use the SPARK Streaming.
I am supposed to calculate the access count to certain webpages.I receive
the webpage access information thru log files.
By Access count I mean "how many times was the page accessed *till now* "
I have the log files for past 2 years and everyday we keep receiving almost
6 GB of access logs(on an hourly basis).
Since we receive these logs on an hourly basis I feel that I should use the
SPARK Streaming.
But the problem is that the access counts have to be cumulative , i.e even
the older access(past 2 years) counts for a webpage should also be
considered for the final value.

How to achieve this thru streaming, since streaming picks only new files.
I don't want to use DB to store the access counts since it would
considerably slow down the processing.


Reply via email to