You can have Spark reading from PostgreSQL through the data access API. Do you 
have any concern with that approach since you mention copying that data into 
HBase.



From: Jeetendra Gangele

Sent: Monday, July 27, 6:00 AM

Subject: Data from PostgreSQL to Spark

To: user



Hi All 



I have a use case where where I am consuming the Events from RabbitMQ using 
spark streaming.This event has some fields on which I want to query the 
PostgreSQL and bring the data and then do the join between event data and 
PostgreSQl data and put the aggregated data into HDFS, so that I run run 
analytics query over this data using SparkSQL. 



my question is PostgreSQL data in production data so i don't want to hit so 
many times. 



at any given  1 seconds time I may have 3000 events,that means I need to fire 
3000 parallel query to my PostGreSQl and this data keeps on growing, so my 
database will go down. 


  


I can't migrate this PostgreSQL data since lots of system using it,but I can 
take this data to some NOSQL like base and query the Hbase, but here issue is 
How can I make sure that Hbase has upto date data? 



Any anyone suggest me best approach/ method to handle this case? 




Regards 


Jeetendra 

Reply via email to