Hi All I have a use case where where I am consuming the Events from RabbitMQ using spark streaming.This event has some fields on which I want to query the PostgreSQL and bring the data and then do the join between event data and PostgreSQl data and put the aggregated data into HDFS, so that I run run analytics query over this data using SparkSQL.
my question is PostgreSQL data in production data so i don't want to hit so many times. at any given 1 seconds time I may have 3000 events,that means I need to fire 3000 parallel query to my PostGreSQl and this data keeps on growing, so my database will go down. I can't migrate this PostgreSQL data since lots of system using it,but I can take this data to some NOSQL like base and query the Hbase, but here issue is How can I make sure that Hbase has upto date data? Any anyone suggest me best approach/ method to handle this case? Regards Jeetendra