No woder I had out of memory issue before…

I doubt if we really need such configuration on production level…

Best regards,

Cui Lin

From: Krishna Sankar <ksanka...@gmail.com<mailto:ksanka...@gmail.com>>
Date: Sunday, March 8, 2015 at 3:27 PM
To: Nasir Khan <nasirkhan.onl...@gmail.com<mailto:nasirkhan.onl...@gmail.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: General Purpose Spark Cluster Hardware Requirements?

Without knowing the data size, computation & storage requirements ... :

  *   Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine. 
Probably 5-10 machines.
  *   Don't go for the most exotic machines, otoh don't go for cheapest ones 
either.
     *   Find a sweet spot with your vendor i.e. if dual 6 cores are a lot 
cheaper than dual 10 cores then go with the less expensive ones. Same with 
disks - may be 2TB is a lot cheaper than 3 TB.
  *   Decide if these are going to be storage intensive or compute intensive (I 
assume the latter) and configure accordingly
  *   Make sure you can add storage to the machines - ie have free storage bays.
     *   Or other way is to add more machines and buy the smaller speced 
machines.
  *   Unless one has very firm I/O and compute requirements, I have found that 
FLOPS, and things of that nature, do not make that much sense.
     *   Think in terms of RAM, CPU and storage - that is what will become the 
initial limitations.
     *   Once there are enough production jobs, you can then figure out the 
FLOPS et al
  *   10 G network is a better choice, so price-in a 24-48 port TOR switch.
     *   More concerned with the bandwidth between the cluster nodes, for 
shuffles et al

Cheers
<k/>

On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan 
<nasirkhan.onl...@gmail.com<mailto:nasirkhan.onl...@gmail.com>> wrote:
HI, I am going to submit a proposal to my University to setup my Standalone
Spark Cluster, what hardware should i include in my proposal?

I will be Working on classification (Spark MLlib) of Data streams (Spark
Streams)

If some body can fill up this answers, that will be great! Thanks

*Cores *= (example 64 nodes, 1024 cores, your figures) ____________?

*Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___________?

*GPU*= YES/NO ___________?

*Fat Node* = YES/NO ___________?

*CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___________?

*RAM/CPU* = (example 256GB, your figures) ___________?
*
Storage Processing* = (example 200TB, your figures) ___________?

*Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___________?

*Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS

Note:I Need a *general purpose* cluster, not very high end nor very low
specs. It will not be dedicated to just one project i guess. You people
already have experience in setting up clusters, that's the reason i posted
it here :)





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>


Reply via email to