Hi All

I have a use case where where I am consuming the Events from RabbitMQ using
spark streaming.This event has some fields on which I want to query the
PostgreSQL and bring the data and then do the join between event data and
PostgreSQl data and put the aggregated data into HDFS, so that I run run
analytics query over this data using SparkSQL.

my question is PostgreSQL data in production data so i don't want to hit so
many times.

at any given  1 seconds time I may have 3000 events,that means I need to
fire 3000 parallel query to my PostGreSQl and this data keeps on growing,
so my database will go down.

I can't migrate this PostgreSQL data since lots of system using it,but I
can take this data to some NOSQL like base and query the Hbase, but here
issue is How can I make sure that Hbase has upto date data?

Any anyone suggest me best approach/ method to handle this case?


Regards
Jeetendra

Reply via email to