Hi all, The requirement is, to process file using Spark Streaming fed from Kafka Topic and once all the transformations are done, make it a batch of static dataframe and pass it into a Spark ML Model tuning.
As of now, I had been doing it in the below fashion - 1) Read the file using Kafka 2) Consume it in Spark using a streaming dataframe 3) Run spark transformation jobs on streaming data 4) Append and write on HDFS. 5) Read the transformed file as batch in Spark 6) Run Spark ML Model But, the requirement is to avoid use of HDFS as it may not be installed in certain clusters, so, we've to avoid the disk I/O and do it on the fly from Kafka to append in a spark static DF and hence pass that DF to the ML Model. How to go about it? Thanks, Aakash.