Thank you Mirko, i saw the chapter title PLANNING A HADOOP CLUSTER.

I’ll take that book.

From: Mirko Kämpf [mailto:mirko.kae...@gmail.com]
Sent: Thursday 10 July 2014 11:22
To: user@hadoop.apache.org
Subject: Re: Need to evaluate a cluster

Just request a quote from the leading and also local vendors. Tell them about 
the volume and the access pattern you have in mind and collect the offerings. 
Than you compare the prices. You should consider space (in the data center) 
network architecture and energy cosumption as well as heat generation in such a 
cluster which handles some PB on the long run.
Have a look into the book: 
http://www.amazon.de/Hadoop-Operations-Eric-Sammer/dp/1449327052
Cheers,
Mirko


2014-07-10 11:10 GMT+02:00 YIMEN YIMGA Gael 
<gael.yimen-yi...@sgcib.com<mailto:gael.yimen-yi...@sgcib.com>>:
Thank for your return Mirko,

In my case, I can consider compression factor of *8 according to the service in 
charge of it.

Data, I’m dealing with are : logs only. But it’s many types of logs (printing 
logs, USB logs, Remote access logs, Active Directory logs, database servers 
logs, Web servers logs, Antivirus logs, etc.)
I precise that in my case it’s only logs that are stored. Sometime we could 
have CSV files. But no videos or images are considered here.

Any advice according to that specific type of data?
What are the reasons to consider servers with 12 HDD (3TB) per server? Knowing 
that, I prefer the LOW-COST.
What could be the price of a LOW-COST server with 12HDD (3TB) ?

Regards

From: Mirko Kämpf [mailto:mirko.kae...@gmail.com<mailto:mirko.kae...@gmail.com>]
Sent: Thursday 10 July 2014 11:01

To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Need to evaluate a cluster

I multiply by 1.3 which means I add 30% of the estimated amount to have 
reserved capacity for intermediate data.
In your case with approx. 2TB per day I think, data nodes with 1 to 3 discs are 
not a good idea. You should consider servers with more discs and than add one 
per week. Start with 10 servers and 12 HDD (3TB) per server. This allows you to 
handle approx. 35 TB raw uncompressed data. You have to evaluated compression 
in your special case. It can be high, but also not very high, if raw data is 
already compressed somehow. What data are you dealing with?
Text, messages, logs or more binary data like images, mp3 oder video formats?
Cheers,
Mirko

2014-07-10 10:43 GMT+02:00 YIMEN YIMGA Gael 
<gael.yimen-yi...@sgcib.com<mailto:gael.yimen-yi...@sgcib.com>>:
Hi,

What does « 1.3 for overhead » mean in this calculation ?

Regards

From: Mirko Kämpf [mailto:mirko.kae...@gmail.com<mailto:mirko.kae...@gmail.com>]
Sent: Wednesday 9 July 2014 18:09

To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Need to evaluate a cluster

Hello,

if I follow your numbers I see one missing fact: What is the number of HDDs per 
DataNode?
Let's assume you use machines with 6 x 3TB HDDs per box, you would need about 
60 DataNodes
per year (0.75 TB per day x 3 for replication x 1.3 for overhead / ( nr of HDDs 
per node x capacity per HDD )).
With 12 HDD you would only need 30 servers per year.
How did you calculate the number of 367 datanodes?

Cheers,
Mirko

2014-07-09 17:59 GMT+02:00 YIMEN YIMGA Gael 
<gael.yimen-yi...@sgcib.com<mailto:gael.yimen-yi...@sgcib.com>>:
Hello Dear,

I made an estimation of a number of nodes of a cluster that can be supplied by 
720GB of data/day.
My estimation gave me 367 datanodes in a year. I’m a bit afraid by that amount 
of datanodes.
The assumptions, I used are the followings :


-          Daily supply (feed) : 720GB

-          HDFS replication factor: 3

-          Booked space for each disk outside HDFS: 30%

-          Size of a disk: 3TB.

I have two questions.

First, I would like to know if my assumptions are well taken?
Secondly, could someone help me to evaluate that cluster, to let me be sure 
that my results are not to excessive, please ?

Standing by for your feedback

Warm regard

*************************************************************************
This message and any attachments (the "message") are confidential, intended 
solely for the addressee(s), and may contain legally privileged information.
Any unauthorised use or dissemination is prohibited. E-mails are susceptible to 
alteration.
Neither SOCIETE GENERALE nor any of its subsidiaries or affiliates shall be 
liable for the message if altered, changed or
falsified.
Please visit http://swapdisclosure.sgcib.com for important information with 
respect to derivative products.
                              ************
Ce message et toutes les pieces jointes (ci-apres le "message") sont 
confidentiels et susceptibles de contenir des informations couvertes
par le secret professionnel.
Ce message est etabli a l'intention exclusive de ses destinataires. Toute 
utilisation ou diffusion non autorisee est interdite.
Tout message electronique est susceptible d'alteration.
La SOCIETE GENERALE et ses filiales declinent toute responsabilite au titre de 
ce message s'il a ete altere, deforme ou falsifie.
Veuillez consulter le site http://swapdisclosure.sgcib.com afin de recueillir 
d'importantes informations sur les produits derives.
*************************************************************************



Reply via email to