Hi Ayan Thanks for reply. Its around 5 GB having 10 tables...this data changes very frequently every minutes few updates its difficult to have this data in spark, if any updates happen on main tables, how can I refresh spark data?
On 28 July 2015 at 02:11, ayan guha <guha.a...@gmail.com> wrote: > You can call dB connect once per partition. Please have a look at design > patterns of for each construct in document. > How big is your data in dB? How soon that data changes? You would be > better off if data is in spark already > On 28 Jul 2015 04:48, "Jeetendra Gangele" <gangele...@gmail.com> wrote: > >> Thanks for your reply. >> >> Parallel i will be hitting around 6000 call to postgreSQl which is not >> good my database will die. >> these calls to database will keeps on increasing. >> Handling millions on request is not an issue with Hbase/NOSQL >> >> any other alternative? >> >> >> >> >> On 27 July 2015 at 23:18, <felixcheun...@hotmail.com> wrote: >> >>> You can have Spark reading from PostgreSQL through the data access API. >>> Do you have any concern with that approach since you mention copying that >>> data into HBase. >>> >>> From: Jeetendra Gangele >>> Sent: Monday, July 27, 6:00 AM >>> Subject: Data from PostgreSQL to Spark >>> To: user >>> >>> Hi All >>> >>> I have a use case where where I am consuming the Events from RabbitMQ >>> using spark streaming.This event has some fields on which I want to query >>> the PostgreSQL and bring the data and then do the join between event data >>> and PostgreSQl data and put the aggregated data into HDFS, so that I run >>> run analytics query over this data using SparkSQL. >>> >>> my question is PostgreSQL data in production data so i don't want to hit >>> so many times. >>> >>> at any given 1 seconds time I may have 3000 events,that means I need to >>> fire 3000 parallel query to my PostGreSQl and this data keeps on growing, >>> so my database will go down. >>> >>> >>> >>> I can't migrate this PostgreSQL data since lots of system using it,but I >>> can take this data to some NOSQL like base and query the Hbase, but here >>> issue is How can I make sure that Hbase has upto date data? >>> >>> Any anyone suggest me best approach/ method to handle this case? >>> >>> Regards >>> >>> Jeetendra >>> >>>