Hi everyone,
I'm a 100% sure that I'm not the first one to consider benchmarking a
whole infraestructure, but yet I haven't found any relevant information
on how to face this challenge.
As I guess we all know what we are talking about, I will try to draw the
scenario as short as possible.
I've been working for years in the hosting business and we've always
done some kind of testing for new stuff. Back in the Wonder Years, we
did some "ab", "dd" or "bonnie", etc. to test disk, cpu and so on.
Then we grew up and we needed to benchmark our commercial website or
some big customer's website. We tried a lot of tools and we ended up
with things like Jmeter, which was a great help for quite a while.
Recently, we've been using Locust, which is a great tool, not so easy to
setup, but very powerful.
But now we can say "we are mature", we sell cloud and many more things
that a bunch of websites on a webserver are at stake.
As a cloud engineer you design a storage solution that will be able to
host thousands of vms (The same scenario and needs apply to other things
like big Database clusters, for example). You spend days doing your
maths and propose a number of available vms for the budget you are
assigned. We all know what comes next... someone comes and says... no
way... we have to fit at least twice as much in there, we have to make
money!!
So... you know that there is no possible way to fit twice the number of
vms you calculated, but you have to open a negotiation and provide
enough information to the managers in order to come to an agreement,
somewhere in between.
And here's where the problem lies... how am I supposed to test a whole
infrastructure that can host thousands of vms?
We've already worked with load distributed testing tools, such as Jmeter
or Locust. They are great, but have one big issue: They were designed to
test one ip address, not thousands of vms.
So... I guess that many have come to this situation only to realise that
there is no way to test that effectively. However, I'm sure you guys
have at some point found a way to test an infrastructure like this in a
more realistic way than performing tests the old way. I would appreciate
any idea you can give me.
Obviously you need a proper architecture and setup, nice hardware, daily
maintenance, and much more. Everything we can possibly do to get our
system clean and updated is already being done, but... at what point
should we stop putting data in?
What we are doing right when preparing a new system is:
-create a nagios/munin system where you monitor the main stuff: network,
disk latencies, etc.
-create hundreds/thousands of vms, depending on the TBs available.
-launch all or most of those vms (some are only used to occupy the space).
-ssh into most of them and perform at once or intermittently some type
of disk test like dd, bonnie or iozone.
-start browsing "manually" some websites hosted on those vms and decide
if they are slow. Obviously this is a very subjective matter. Despite
that, we can say that most people feel "happy" if the web loads in a
less than a second.
sometimes, by just looking at the munin graphs, you can see some
possible bottlenecks, but we've had degradations of the service with a
lot of less active vms than the warning threshold we managed to identify
during the tests.
So, to sum up, I know that if someone had came out with a solution for
this issue, it would be very simple to find on the first page of Google,
but let's put in common our strategies to see if we can properly
benchmark some small parts of the system.
Regards,
Jordi.
--
--
Jordi Moles Blanco
IaaS Engineer Cdmon.com
___________________________
Tlf: 902 36 41 38
Tlf: 93 567 75 77
mailto: [email protected]
http://www.cdmon.com
http://es.linkedin.com/in/jordimolesblanco
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/