Hi, I have a Spark streaming that reads messages/prices from Kafka and writes it as text file to HDFS.
This is pretty efficient. Its only function is to persist the incoming messages to HDFS. This is what it does dstream.foreachRDD { pricesRDD => val x= pricesRDD.count // Check if any messages in if (x > 0) { // Combine each partition's results into a single RDD val cachedRDD = pricesRDD.repartition(1).cache cachedRDD.saveAsTextFile("/data/prices/prices_" + System.currentTimeMillis.toString) .... So these are the files on HDFS directory drwxr-xr-x - hduser supergroup 0 2016-09-14 15:11 /data/prices/prices_1473862284010 drwxr-xr-x - hduser supergroup 0 2016-09-14 15:11 /data/prices/prices_1473862288010 drwxr-xr-x - hduser supergroup 0 2016-09-14 15:11 /data/prices/prices_1473862290010 drwxr-xr-x - hduser supergroup 0 2016-09-14 15:11 /data/prices/prices_1473862294010 Now I present these prices to Zeppelin. These files are produced every 2 seconds. However, when I get to plot them, I am only interesting in one hours data say. I cater for this by using filter on prices (each has a TIMECREATED). I don't think this is efficient as I don't want to load all these files. I just want to to read the prices created in past hour or something. One thing I considered was to load all prices by converting System.currentTimeMillis into today's date and fetch the most recent ones. However, this is looking cumbersome. I can create these files with any timestamp extension when persisting but System.currentTimeMillis seems to be most efficient. Any alternatives you can think of? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.