Hi I have couple of Spark jobs which reads Hive table partitions data and processes it independently in different threads in a driver. Now data to process is huge in terms of TB my jobs are not scaling and running slow. So I am thinking to use Spark Streaming as and when data is added into Hive partitions so that I dont need to process only loaded partitions.
Can we read directly Hive table partitions data using Spark streaming? Please guide. Also please share best practices to process TBs of data generated everyday. Please guide. Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-we-using-Spark-Streaming-to-stream-data-from-Hive-table-partitions-tp24915.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org