H Jorn,

Thanks for suggestion.

My current cluster setup is mentioned in attached snapshot .Apart from PotgreXL 
do you see any problem over there?


Regards,
Saurabh

From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: Monday, May 30, 2016 12:12 PM
To: Kumar, Saurabh 5. (Nokia - IN/Bangalore) <saurabh.5.ku...@nokia.com>
Cc: user@spark.apache.org; Sawhney, Prerna (Nokia - IN/Bangalore) 
<prerna.sawh...@nokia.com>
Subject: Re: Query related to spark cluster


Well if you require R then you need to install it (including all additional 
packages) on each node. I am not sure why you store the data in Postgres . 
Storing it in Parquet and Orc is sufficient in HDFS (sorted on relevant 
columns) and you use the SparkR libraries to access them.

On 30 May 2016, at 08:38, Kumar, Saurabh 5. (Nokia - IN/Bangalore) 
<saurabh.5.ku...@nokia.com<mailto:saurabh.5.ku...@nokia.com>> wrote:
Hi Team,

I am using Apache spark to build scalable Analytic engine. My setup is as 
follows.

Flow of processing is as follows:

Raw Files > Store to HDFS > Process by Spark and Store to Postgre_XL Database > 
R process data fom Postgre-XL to process in distributed mode.

I have 6 nodes cluster setup for ETL operations which have

1.      Spark slaves installed on all 6 of them.
2.      HDFS data nodes on each of 6 nodes with replication factor 2.
3.      PosGRE –XL 9.5 Database coordinator on each of 6 nodes.
4.      R software is installed on all nodes and Uses process Data from 
Postgre-XL in distributed manner.




Can you please guide me about pros and cons of this setup.
Installing all component on every machines is recommended or there is any 
drawback?
R software should run on spark cluster ?



Thanks & Regards
Saurabh Kumar
R&D Engineer, T&I TED Technology Explorat&Disruption
Nokia Networks
L5, Manyata Embassy Business Park, Nagavara, Bangalore, India 560045
Mobile: +91-8861012418
http://networks.nokia.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to