Hi,

Bear in mind that you typically need 1GB of NameNode memory for 1 million
blocks. So if you have 128MB block size, you can store 128 * 1E6 / (3
*1024) = 41,666GB of data for every 1GB. Number 3 comes from the fact that
the block is replicated three times. In other words just under 42TB of
data. So if you have 5GB of NameNode cache, you can have up to 210TB of
data on your DataNodes. You also need to account for each YARN container
resource that includes memory, CPU and disk resources. This could be up to
8GB memory with minimum allocation of 1GB. I am not sure if a YARN
container can use more than one core (someone please correct me).
Regardless Spark will try to use memory for its work and that has to fit in
a YARN container whether it is a pure Spark Process or Hive running on
Spark engine. Will the 8GB limit set by
(yarn.scheduler.maximum-allocation-mb) applies here (meaning with Spark) as
well?





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 10 March 2016 at 23:53, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote:

> Ashok,
>
>    Cluster nodes has enough memory but CPU cores are less.  512GB / 16 =
> 32 GB. For 1 core the cluster has 32GB memory. Either their should be more
> cores available to use efficiently the
> available memory or don't configure a higher executor memory which will
> cause lot of GC.
>
> Thanks,
> Prabhu Joseph
>
> On Fri, Mar 11, 2016 at 3:45 AM, Ashok Kumar <ashok34...@yahoo.com.invalid
> > wrote:
>
>>
>> Hi,
>>
>> We intend  to use 5 servers which will be utilized for building Bigdata
>> Hadoop data warehouse system (not using any propriety distribution like
>> Hortonworks or Cloudera or others).
>> All servers configurations are 512GB RAM, 30TB storage and 16 cores,
>> Ubuntu Linux servers. Hadoop will be installed on all the servers/nodes.
>> Server 1 will be used for NameNode plus DataNode as well. Server 2 will be
>> used for standby NameNode & DataNode. The rest of the servers will be
>> used as DataNodes..
>> Now we would like to install Spark on each servers to create Spark
>> cluster. Is that the good thing to do or we should buy additional hardware
>> for Spark (minding cost here) or simply do we require additional memory to
>> accommodate Spark as well please. In that case how much memory for each
>> Spark node would you recommend?
>>
>>
>> thanks all
>>
>
>

Reply via email to