H Jorn, Thanks for suggestion.
My current cluster setup is mentioned in attached snapshot .Apart from PotgreXL do you see any problem over there? Regards, Saurabh From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: Monday, May 30, 2016 12:12 PM To: Kumar, Saurabh 5. (Nokia - IN/Bangalore) <saurabh.5.ku...@nokia.com> Cc: user@spark.apache.org; Sawhney, Prerna (Nokia - IN/Bangalore) <prerna.sawh...@nokia.com> Subject: Re: Query related to spark cluster Well if you require R then you need to install it (including all additional packages) on each node. I am not sure why you store the data in Postgres . Storing it in Parquet and Orc is sufficient in HDFS (sorted on relevant columns) and you use the SparkR libraries to access them. On 30 May 2016, at 08:38, Kumar, Saurabh 5. (Nokia - IN/Bangalore) <saurabh.5.ku...@nokia.com<mailto:saurabh.5.ku...@nokia.com>> wrote: Hi Team, I am using Apache spark to build scalable Analytic engine. My setup is as follows. Flow of processing is as follows: Raw Files > Store to HDFS > Process by Spark and Store to Postgre_XL Database > R process data fom Postgre-XL to process in distributed mode. I have 6 nodes cluster setup for ETL operations which have 1. Spark slaves installed on all 6 of them. 2. HDFS data nodes on each of 6 nodes with replication factor 2. 3. PosGRE –XL 9.5 Database coordinator on each of 6 nodes. 4. R software is installed on all nodes and Uses process Data from Postgre-XL in distributed manner. Can you please guide me about pros and cons of this setup. Installing all component on every machines is recommended or there is any drawback? R software should run on spark cluster ? Thanks & Regards Saurabh Kumar R&D Engineer, T&I TED Technology Explorat&Disruption Nokia Networks L5, Manyata Embassy Business Park, Nagavara, Bangalore, India 560045 Mobile: +91-8861012418 http://networks.nokia.com/
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org