Below you can find a recent discussion that a number of active developers
(myself included) participated in this morning on the subject of coming up
with unit tests to test the performance impact of using threads.

Participants:
Alex Chemeris (Alexander.Chemeris)
Jaroslav Libak (yarol1)
Keith Kyzivat (kamaji)
Sergey Konstanbaev (sergforce)
Dan Petrie -- (yarol1)

*Conference on 5/3/2007 10:00:40 AM:*
(10:00:40 AM) sergforce entered the room.
(10:00:40 AM) Alexander.Chemeris entered the room.
(10:00:40 AM) kamaji entered the room.
(10:00:41 AM) ***sergforce has set the topic to: WinCE performance
(10:00:50 AM) yarol1 entered the room.
(10:00:55 AM) yarol1: hi
(10:01:04 AM) kamaji: Hi Jaroslav
(10:01:09 AM) Alexander.Chemeris: hi.
(10:01:15 AM) Alexander.Chemeris: waiting for Dan..
(10:01:32 AM) kamaji: He's at his computer but is experiencing technical
difficulties it seems
(10:01:47 AM) daniel.g.petrie entered the room.
(10:01:51 AM) kamaji: Aha
(10:01:58 AM) daniel.g.petrie: hello
(10:02:05 AM) yarol1: hi
(10:02:41 AM) sergforce: Hi :)
(10:03:17 AM) Alexander.Chemeris: So, all is here. Lets begin?
(10:03:33 AM) daniel.g.petrie: so I guess the topic for discussion here
scoped to performance optimization related to many threads on more limited
devices like a low end WinCE processor
(10:03:52 AM) kamaji: Yes.
(10:04:39 AM) daniel.g.petrie: I wanted to start by discussing ideas on how
we might prove the hypothisis that a single thread performs significantly
better than many threads
(10:04:52 AM) daniel.g.petrie: any ideas on how we might test or prove this?
(10:05:30 AM) sergforce: How many threads in real sipXtapi applications
could be?
(10:06:00 AM) Alexander.Chemeris: Under Linux we could see when one thread
is switched to other. Is there any way to do this under WinCE?
(10:06:10 AM) yarol1: derive a class from OsServerTask that will do some
computations in a loop and remember how many times the loop executed, start
~30 OsServerTasks doing the same thing, and then repeat it with ~5 tasks,
and compare the work done in the same time
(10:06:29 AM) kamaji: How was this done under Linux (seeing when there's a
context switch)?
(10:06:52 AM) Alexander.Chemeris: with strace tool for example
(10:06:57 AM) kamaji: ahh
(10:07:06 AM) Alexander.Chemeris: or other thread profiler
(10:07:15 AM) Alexander.Chemeris: 2yarol:
(10:07:22 AM) daniel.g.petrie: I think observer or counting context switches
is interesting, but I am not sure we need to know that to do the test
(10:07:23 AM) Alexander.Chemeris: yes, I think this is reasonable
(10:07:28 AM) kamaji: Jaro: sounds good.
(10:07:50 AM) kamaji: dan: how would we know the performance then?
(10:08:07 AM) Alexander.Chemeris: 2Dan: *nod* But it may help anyway
(10:08:45 AM) kamaji: scratch last. I understand now.
(10:09:33 AM) kamaji: Is simple looping going to show enough?
(10:09:33 AM) sergforce: Context switching under CE5 can be measured by
kernel tracker utility.. (No source code modification needed)
(10:09:35 AM) Alexander.Chemeris: Ok, then we'll measure performance for
full-load application
(10:09:59 AM) yarol1: not simple looping, do some sin, cos, multiplying,
addition etc
(10:10:06 AM) kamaji: should the OsServerTask thread test do a bit more
computation..
(10:10:07 AM) yarol1: also use new, delete..
(10:10:09 AM) kamaji: *nod*
(10:10:14 AM) daniel.g.petrie: we can create a set of worker threads
(OsServerTask derivatives) and queue jobs for them to do
(10:11:02 AM) daniel.g.petrie: test with a single thead doing a fixed set of
jobs and then compare with many threads doing the same number of jobs?
(10:11:42 AM) kamaji: so, we have a test that does the same work, once with
one or few threads, another with 2x the threads, another with 3x the
threads, and then compare aggregate loop counts for each test?
(10:12:12 AM) daniel.g.petrie: I think the job should do some sort of block
or task yeild to force some context switching
(10:12:18 AM) kamaji: Dan and I have the same idea.
(10:12:48 AM) Alexander.Chemeris: What is "aggregate loop counts"?
(10:13:00 AM) daniel.g.petrie: new and delete are good things to do in the
job as well as there becomes contention over the malloc lock
(10:13:01 AM) kamaji: sum of all loop counts for all threads in a run
(10:13:11 AM) kamaji: (Aggregate loop counts)
(10:14:06 AM) yarol1: you could just measure time, if the task took long
enough
(10:14:10 AM) Alexander.Chemeris: hrm.. I do not understand what is loop
count? I thought we run a loop wtih fixed number of iterations
(10:14:52 AM) Alexander.Chemeris: and then measure total time, needed for
this loop to finish
(10:15:00 AM) yarol1: so we can either measure # of loops (then amout of
work is different, time constant), or measure time and the amout of work is
constant
(10:15:15 AM) Alexander.Chemeris: *nod*
(10:15:31 AM) daniel.g.petrie: yeah, I think we just need to create enough
jobs and big enough task in each job so the time is significant
(10:15:53 AM) Alexander.Chemeris: I think measuring time with constant
number of iterations is more precise
(10:15:58 AM) yarol1: I think 2nd option with constant job is better, as we
dont have to synchronize tasks in the beginning and stop them at the same
time
(10:16:49 AM) kamaji: Ok
(10:16:57 AM) Alexander.Chemeris: Seems Jaroslav and I have similar point on
this
(10:17:01 AM) kamaji: *nod*
(10:17:06 AM) kamaji: I see your point
(10:17:13 AM) kamaji: it would be easier to control.
(10:17:15 AM) daniel.g.petrie: that is what I was suggesting as well
(10:17:24 AM) daniel.g.petrie: constant number of jobs
(10:17:24 AM) kamaji: I don't see yet quite how it would be more precise
though.
------------------------------

(10:18:48 AM) *yarol1 entered the room.*
(10:18:48 AM) *sergforce entered the room.*
(10:18:48 AM) *Alexander.Chemeris entered the room.*
(10:18:48 AM) *daniel.g.petrie entered the room.*
(10:18:48 AM) *kamaji entered the room.*
(10:18:12 AM) *Alexander.Chemeris:* It would be more precise due to
start/stop issues, I think
(10:18:21 AM) *daniel.g.petrie:* re: precison: how do you count part of a
job when the stop timer fires
(10:18:48 AM) *sergforce has set the subject to: WinCE performance*
(10:19:07 AM) *kamaji:* I'm good with #2 though.
(10:19:13 AM) *Alexander.Chemeris:* 2Dan: it is about constant number of
iterations?
(10:19:28 AM) *kamaji:* alex, dan: good points.
(10:20:19 AM) *yarol1:* dan, you would have to kill the thread from outside
at some time
(10:20:29 AM) *daniel.g.petrie:* Alex: yes, I was agreeing with constant
number of iterations, by pointing out that it is difficult to get a count
with contant time
(10:20:43 AM) *yarol1:* but not kill the OsServerTask
(10:21:03 AM) *yarol1:* only the thread so you could read the number of
iterations from it
(10:21:05 AM) *Alexander.Chemeris:* 2Dan, Keith: ok
(10:21:52 AM) *Alexander.Chemeris:* 2Jaro: It's true, question is solely
about precision I believe
(10:22:22 AM) *daniel.g.petrie:* I think we are in agreement: 1 or many
threads doing a constant number of jobs/iterations such that the net time is
significant (e.g. 1 minute)
(10:22:36 AM) *Alexander.Chemeris:* *nod*
(10:23:03 AM) *daniel.g.petrie:* shall we discuss what is done in the job?
(10:23:04 AM) *kamaji:* 1minute for unit test to complete? or 1 minute for
each stage of the test to complete (1 thread then many threads)
(10:23:13 AM) *kamaji:* (i.e. 2 minutes total)
(10:23:15 AM) *kamaji:* ?
(10:23:34 AM) *Alexander.Chemeris:* To be more precise: We have N iterations
and want to test M threads. Then we run N iterations in one thread and then
N/M iterations in M threads. Right?
(10:24:03 AM) *kamaji:* That is what I am under the assumption we have been
talking about.
(10:24:16 AM) *daniel.g.petrie:* yes, where M must be a factor of N
(10:24:32 AM) *Alexander.Chemeris:* 2Keith: I think 10sec is enough, so 1
min is just an a example
(10:24:39 AM) *yarol1:* no we have N jobs with M iterations and T threads,
we send all jobs to the number of threads we have (1 job only sent to 1
thread), and count the time required
(10:24:50 AM) *Alexander.Chemeris:* re: M must be a factor of N: sure
(10:25:14 AM) *daniel.g.petrie:* I was thinking iterations and jobs were the
same thing
(10:25:26 AM) *yarol1:* k
(10:25:45 AM) *daniel.g.petrie:* within a job we might do something
iteratively to make it consume CPU though
(10:26:11 AM) *Alexander.Chemeris:* aha, that is what I was thinking of too
(10:26:42 AM) *daniel.g.petrie:* I think we only care about the total time
for doing all the jobs right?
(10:26:52 AM) *yarol1:* yes
(10:26:59 AM) *kamaji:* That's the useful statistic we get out of this test
yes.
(10:27:57 AM) *daniel.g.petrie:* so the job should new/delete or malloc/free
and yield anything else besides do some computations?
(10:27:59 AM) *Alexander.Chemeris:* So, could we guess usefullness of
threads agregation for low-intensive tasks from this test?
(10:28:47 AM) *Alexander.Chemeris:* .. as I think we have much more
low-intensive tasks and only few high-intensive ones
(10:28:55 AM) *daniel.g.petrie:* Alex, I am not sure I understand your
question
(10:29:30 AM) *Alexander.Chemeris:* Almost all of our tasks are waiting for
message almost all the time (it's low-intensive tasks)
(10:29:33 AM) *daniel.g.petrie:* are you suggestion that most tasks are not
that busy (low-intensive)
(10:29:41 AM) *yarol1:* alexander, I think we need the maximum gain from
reducing the number of threads, if the tasks are low intersive i think it
will be lower
(10:29:47 AM) *kamaji:* Dan: I think he's meaning that we have quite a few
tasks that do small things, then sleep, do more small things, sleep. etc...
(low intensive).
Does this test whether having a lot of these impacts performance?
(10:30:12 AM) *yarol1:* we want to know the worst case
(10:30:46 AM) *daniel.g.petrie:* what is the worst case?
(10:30:52 AM) *Alexander.Chemeris:* 2Keith: yes
(10:30:53 AM) *kamaji:* Jaro: *nod* but is there a chance that the scheduler
behavior is different with low intensive tasks?
(10:31:32 AM) *yarol1:* I think it is, and that with low intensive tasks the
gain from reducing the number of threads will be lower
(10:32:19 AM) *Alexander.Chemeris:* What I'm thinking with low-intensive
tasks, is that their performance would be greatly improved with agregation,
as we will not need synchonize postMessage() between them then
(10:32:38 AM) *kamaji:* Do we want to make this two tests then? one
high-intensity, one low-intensity?
(10:33:07 AM) *kamaji:* alex: I can see that being a gain.
(10:33:47 AM) *Alexander.Chemeris:* 2Keith: "I
can't"?
(10:33:56 AM) *sergforce:* Do we going to use synchronization (I.e. mutexes,
semaphores) between threads? Do we use different priority on threads in this
test?
(10:33:57 AM) *daniel.g.petrie:* I was just thinking on another track: we
should separate the overhead of creating and deleting tasks into a different
test, so we only measure context switching cost in this test
(10:35:04 AM) *yarol1:* dan, thread creation wouldnt be measured in this
test at all, since we would first start all tasks, and then post them some
jobs
(10:35:13 AM) *kamaji:* serg: not sure -- we don't want to make this test
too complicated.
(10:35:37 AM) *daniel.g.petrie:* that is what I was suggesting
(10:37:27 AM) *daniel.g.petrie:* the synchonization operations might be
interesting in a separate test. Lets focus on context switching cost for now
(10:38:11 AM) *daniel.g.petrie:* should we do more than malloc, yield,
compute to force context switching in a job?
(10:38:28 AM) *kamaji:* so, as serg said - do we want to put in any thread
synchronization into this?
(10:38:39 AM) *Alexander.Chemeris:* 2Dan: I think synchronization is
significant too, but agree that it would be separate test
(10:38:50 AM) *kamaji:* Ok
(10:39:12 AM) *kamaji:* So each thread would be working with data that it
owns.
(10:39:25 AM) *daniel.g.petrie:* righ
(10:39:34 AM) *daniel.g.petrie:* right
(10:39:34 AM) *kamaji:* Ok.
(10:39:46 AM) *Alexander.Chemeris:* And I'm not sure is it possible to
measure cost of synchronization with low-intensive tasks.
(10:40:40 AM) *daniel.g.petrie:* we could always make the number of jobs
less than the number of threads
(10:41:09 AM) *Alexander.Chemeris:* 2Dan: And what then?
(10:41:34 AM) *daniel.g.petrie:* to represent low-intense tasks
(10:41:44 AM) *daniel.g.petrie:* or create big and small jobs
(10:42:30 AM) *Alexander.Chemeris:* yeah, I thought about low-intensive
tasks alone, without high-intensive ones...
(10:42:38 AM) *daniel.g.petrie:* give some tasks a small job mostly to force
a context switch
(10:43:34 AM) *Alexander.Chemeris:* hum, I think we could measure
low-intensive tasks gain with intensive message interaction between threads
(10:43:48 AM) *Alexander.Chemeris:* threads = OsTasks
(10:44:00 AM) *kamaji:* *nod*
(10:44:15 AM) *Alexander.Chemeris:* A I said most impact of agragating them
soul dbe in less synchronization needed
(10:44:48 AM) *Alexander.Chemeris:* So, create several tasks and let them
exchange with messages as fast as they could.
(10:45:12 AM) *Alexander.Chemeris:* And then see how long it will take with
separate threads and in one agregate thread
(10:45:20 AM) *Alexander.Chemeris:* That is how I see it
(10:45:41 AM) *Alexander.Chemeris:* s/soul dbe/should be/
(10:46:21 AM) *daniel.g.petrie:* we could create a queue in the main thread
and have each thread post a I'm starting and I'm ending message
(10:47:20 AM) *daniel.g.petrie:* for each job
(10:47:31 AM) *Alexander.Chemeris:* 2Dan: And what then?
(10:47:47 AM) *kamaji:* Alex: when you say messages - do you mean OsMsg in
an OsMsgQ?
(10:48:38 AM) *Alexander.Chemeris:* I meant working with
OsServerTask::postMessage()
(10:48:46 AM) *daniel.g.petrie:* the syncronization bottle neck would be on
the queue. Isn't that what you were suggesting was a problem?
(10:49:24 AM) *kamaji:* alex: ok.
(10:50:07 AM) *Alexander.Chemeris:* 2Dan: I think it's a one part of a
problem
(10:50:52 AM) *daniel.g.petrie:* I am not sure we are converging on what a
job does. I would like to finish up in the next 10 minutes.
(10:51:01 AM) *Alexander.Chemeris:* .. ok, I think it is the whole problem
(10:52:02 AM) *Alexander.Chemeris:* I think we should test several paterns:
1) mathematis only 2) memory (de)allocation and may be 3) synchronization
(10:52:34 AM) *Alexander.Chemeris:* and 4) message exchange
(10:52:38 AM) *kamaji:* for those 3 - we're talking about high-intensity
tests, right?
(10:52:46 AM) *Alexander.Chemeris:* *nod*
(10:52:47 AM) *kamaji:* and #4 would be low-intensity?
(10:53:17 AM) *Alexander.Chemeris:* and 4) is for low-intensity, tested with
high-message exchange rate
(10:53:24 AM) *kamaji:* and for each of 1-4, we'd do two runs - one with
small number of threads (1?), one with many threads.
(10:53:33 AM) *daniel.g.petrie:* the net needs to be high intensity or we do
not have meaningful timing tests
(10:53:40 AM) *Alexander.Chemeris:* at 4) tasks do nothing, only wait for
message
(10:54:04 AM) *Alexander.Chemeris:* 2Dan sure
(10:54:14 AM) *kamaji:* dan: "the net" - what do you mean?
(10:54:24 AM) *kamaji:* dan: the net result?
(10:55:07 AM) *daniel.g.petrie:* the total system view must be intense
compute, resource usage or the time is meaningless
(10:55:47 AM) *Alexander.Chemeris:* 2Dan: *nod* in 4) I propose to play
something like "ping-pong" with messages between tasks
(10:56:10 AM) *daniel.g.petrie:* play hot potato with the message
(10:56:12 AM) *kamaji:* dan: alex's approach for #4 would be compute intense
-- messages back/forth as fast as they can.
(10:56:23 AM) *kamaji:* *nod*
(10:57:23 AM) *kamaji:* by low intensity, we mean that within each context
(before each context switch), only little work gets done (i.e. send out
message)
(10:57:24 AM) *daniel.g.petrie:* ok, so who is going to do this?
(10:57:25 AM) *Alexander.Chemeris:* So, are we in agreement? Jaroslav?
(10:59:54 AM) *yarol1:* sorry I'm solving a problem at my work
(11:00:42 AM) *yarol1:* just dont spend too much time on it, maybe the gain
will be 5% or less
(11:01:01 AM) *yarol1:* if its more, then more complicated tests can be made
(11:02:25 AM) *Alexander.Chemeris:* So, are you going to write some test?
(11:02:52 AM) *daniel.g.petrie:* yeah, it is better to take one step at a
time and not commit a lot of work before we know the benefit. That is really
the point here to do the minimum to see if it is worth making our system
more complicated for this optimization.
(11:05:18 AM) *daniel.g.petrie:* Jaroslav can you take a try at implementing
the first step in these tests?
(11:15:45 AM) *yarol1:* we have a problem with php at work, I'm busy fixing
it
(11:16:25 AM) *Alexander.Chemeris:* Is it a problem for right now, so need
some time to think, or will not have time for this at all?
(11:16:41 AM) *Alexander.Chemeris:* *..
so you need some time..
(11:17:18 AM) *kamaji:* Is this something we can put on the back burner for
the moment? as it seems all of us are in the midst of projects that can't be
put off.
(11:17:28 AM) *kamaji:* We've got a good plan of attack.
(11:18:11 AM) *kamaji:* We can make sure that this conversation makes it on
the mailing list, then others can know the plan... maybe someone will pick
it up off the mailing list?
(11:18:17 AM) *Alexander.Chemeris:* I think some of us should write down
this idea and put ir on the web
(11:18:32 AM) *kamaji:* *nod*
(11:19:51 AM) *Alexander.Chemeris:* Keith, Dan I believe you'll do this much
faster then Serg and I.. Could you?
(11:20:07 AM) *Alexander.Chemeris:* We'll participate editing it :)
(11:20:47 AM) *Alexander.Chemeris:* I believe we need a wiki for this and
other ideas and plans and so on.. Could we use Calvia wiki for now?
(11:20:48 AM) *kamaji:* This is, recording the plan of attack?
(11:20:56 AM) *Alexander.Chemeris:* *nod*
(11:21:47 AM) *kamaji:* we could make a JIRA entry for it..
(11:22:39 AM) *Alexander.Chemeris:* It does not have a good formating and is
hard to do in-text edits. But putting a link to it in JIRA may be good think
(12:21:56 PM) *yarol1 left the room.*
(12:41:47 PM) *daniel.g.petrie left the room.*

--
Keith Kyzivat

SIPez LLC.
SIP VoIP, IM and Presence Consulting
http://www.SIPez.com
tel: +1 (617) 273-4000
_______________________________________________
sipxtapi-dev mailing list
[email protected]
List Archive: http://list.sipfoundry.org/archive/sipxtapi-dev/

Reply via email to