You can implement a custom partitioner -----Original Message----- From: skippi [mailto:skip...@gmx.de] Sent: Sunday, May 10, 2015 10:19 AM To: user@spark.apache.org Subject: spark streaming and computation
Assuming a web server access log shall be analyzed and target of computation shall be csv-files per time, e.g. one per day containing the minute-statistics and one per month containing the hour statistics. Incoming statistics are computed as discretized streams using spark streaming context. Basically I have to create the csv-files, combine them with the discretized stream and then to replace to old csv with the comined one. To realize such computation some kind of timestamp-based partitioning is required, the assign contents of discrete stream to time-slots. But there seems no kind of such processing. Can you give me a hint how to solve this computation? I am missing examples explaining how to compute on base of existing time based data. How to replace existing files? How to design allowing recomputation of larger data sets? regards, markus -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-comp utation-tp22835.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org