RE: spark streaming and computation

Evo Eftimov Sun, 10 May 2015 02:52:56 -0700

You can implement a custom partitioner 

-----Original Message-----
From: skippi [mailto:skip...@gmx.de] 
Sent: Sunday, May 10, 2015 10:19 AM
To: user@spark.apache.org
Subject: spark streaming and computation


Assuming a web server access log shall be analyzed and target of computation
shall be csv-files per time, e.g. one per day containing the
minute-statistics and one per month containing the hour statistics. Incoming
statistics are computed as discretized streams using spark streaming
context.

Basically I have to create the csv-files, combine them with the discretized
stream and then to replace to old csv with the comined one. To realize such
computation some kind of timestamp-based partitioning is required, the
assign contents of discrete stream to time-slots. But there seems no kind of
such processing.

Can you give me a hint how to solve this computation? I am missing examples
explaining how to compute on base of existing time based data. How to
replace existing files? How to design allowing recomputation of larger data
sets?

regards,
markus



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-comp
utation-tp22835.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: spark streaming and computation

Reply via email to