Below you can find a recent discussion that a number of active developers (myself included) participated in this morning on the subject of coming up with unit tests to test the performance impact of using threads.
Participants: Alex Chemeris (Alexander.Chemeris) Jaroslav Libak (yarol1) Keith Kyzivat (kamaji) Sergey Konstanbaev (sergforce) Dan Petrie -- (yarol1) *Conference on 5/3/2007 10:00:40 AM:* (10:00:40 AM) sergforce entered the room. (10:00:40 AM) Alexander.Chemeris entered the room. (10:00:40 AM) kamaji entered the room. (10:00:41 AM) ***sergforce has set the topic to: WinCE performance (10:00:50 AM) yarol1 entered the room. (10:00:55 AM) yarol1: hi (10:01:04 AM) kamaji: Hi Jaroslav (10:01:09 AM) Alexander.Chemeris: hi. (10:01:15 AM) Alexander.Chemeris: waiting for Dan.. (10:01:32 AM) kamaji: He's at his computer but is experiencing technical difficulties it seems (10:01:47 AM) daniel.g.petrie entered the room. (10:01:51 AM) kamaji: Aha (10:01:58 AM) daniel.g.petrie: hello (10:02:05 AM) yarol1: hi (10:02:41 AM) sergforce: Hi :) (10:03:17 AM) Alexander.Chemeris: So, all is here. Lets begin? (10:03:33 AM) daniel.g.petrie: so I guess the topic for discussion here scoped to performance optimization related to many threads on more limited devices like a low end WinCE processor (10:03:52 AM) kamaji: Yes. (10:04:39 AM) daniel.g.petrie: I wanted to start by discussing ideas on how we might prove the hypothisis that a single thread performs significantly better than many threads (10:04:52 AM) daniel.g.petrie: any ideas on how we might test or prove this? (10:05:30 AM) sergforce: How many threads in real sipXtapi applications could be? (10:06:00 AM) Alexander.Chemeris: Under Linux we could see when one thread is switched to other. Is there any way to do this under WinCE? (10:06:10 AM) yarol1: derive a class from OsServerTask that will do some computations in a loop and remember how many times the loop executed, start ~30 OsServerTasks doing the same thing, and then repeat it with ~5 tasks, and compare the work done in the same time (10:06:29 AM) kamaji: How was this done under Linux (seeing when there's a context switch)? (10:06:52 AM) Alexander.Chemeris: with strace tool for example (10:06:57 AM) kamaji: ahh (10:07:06 AM) Alexander.Chemeris: or other thread profiler (10:07:15 AM) Alexander.Chemeris: 2yarol: (10:07:22 AM) daniel.g.petrie: I think observer or counting context switches is interesting, but I am not sure we need to know that to do the test (10:07:23 AM) Alexander.Chemeris: yes, I think this is reasonable (10:07:28 AM) kamaji: Jaro: sounds good. (10:07:50 AM) kamaji: dan: how would we know the performance then? (10:08:07 AM) Alexander.Chemeris: 2Dan: *nod* But it may help anyway (10:08:45 AM) kamaji: scratch last. I understand now. (10:09:33 AM) kamaji: Is simple looping going to show enough? (10:09:33 AM) sergforce: Context switching under CE5 can be measured by kernel tracker utility.. (No source code modification needed) (10:09:35 AM) Alexander.Chemeris: Ok, then we'll measure performance for full-load application (10:09:59 AM) yarol1: not simple looping, do some sin, cos, multiplying, addition etc (10:10:06 AM) kamaji: should the OsServerTask thread test do a bit more computation.. (10:10:07 AM) yarol1: also use new, delete.. (10:10:09 AM) kamaji: *nod* (10:10:14 AM) daniel.g.petrie: we can create a set of worker threads (OsServerTask derivatives) and queue jobs for them to do (10:11:02 AM) daniel.g.petrie: test with a single thead doing a fixed set of jobs and then compare with many threads doing the same number of jobs? (10:11:42 AM) kamaji: so, we have a test that does the same work, once with one or few threads, another with 2x the threads, another with 3x the threads, and then compare aggregate loop counts for each test? (10:12:12 AM) daniel.g.petrie: I think the job should do some sort of block or task yeild to force some context switching (10:12:18 AM) kamaji: Dan and I have the same idea. (10:12:48 AM) Alexander.Chemeris: What is "aggregate loop counts"? (10:13:00 AM) daniel.g.petrie: new and delete are good things to do in the job as well as there becomes contention over the malloc lock (10:13:01 AM) kamaji: sum of all loop counts for all threads in a run (10:13:11 AM) kamaji: (Aggregate loop counts) (10:14:06 AM) yarol1: you could just measure time, if the task took long enough (10:14:10 AM) Alexander.Chemeris: hrm.. I do not understand what is loop count? I thought we run a loop wtih fixed number of iterations (10:14:52 AM) Alexander.Chemeris: and then measure total time, needed for this loop to finish (10:15:00 AM) yarol1: so we can either measure # of loops (then amout of work is different, time constant), or measure time and the amout of work is constant (10:15:15 AM) Alexander.Chemeris: *nod* (10:15:31 AM) daniel.g.petrie: yeah, I think we just need to create enough jobs and big enough task in each job so the time is significant (10:15:53 AM) Alexander.Chemeris: I think measuring time with constant number of iterations is more precise (10:15:58 AM) yarol1: I think 2nd option with constant job is better, as we dont have to synchronize tasks in the beginning and stop them at the same time (10:16:49 AM) kamaji: Ok (10:16:57 AM) Alexander.Chemeris: Seems Jaroslav and I have similar point on this (10:17:01 AM) kamaji: *nod* (10:17:06 AM) kamaji: I see your point (10:17:13 AM) kamaji: it would be easier to control. (10:17:15 AM) daniel.g.petrie: that is what I was suggesting as well (10:17:24 AM) daniel.g.petrie: constant number of jobs (10:17:24 AM) kamaji: I don't see yet quite how it would be more precise though. ------------------------------ (10:18:48 AM) *yarol1 entered the room.* (10:18:48 AM) *sergforce entered the room.* (10:18:48 AM) *Alexander.Chemeris entered the room.* (10:18:48 AM) *daniel.g.petrie entered the room.* (10:18:48 AM) *kamaji entered the room.* (10:18:12 AM) *Alexander.Chemeris:* It would be more precise due to start/stop issues, I think (10:18:21 AM) *daniel.g.petrie:* re: precison: how do you count part of a job when the stop timer fires (10:18:48 AM) *sergforce has set the subject to: WinCE performance* (10:19:07 AM) *kamaji:* I'm good with #2 though. (10:19:13 AM) *Alexander.Chemeris:* 2Dan: it is about constant number of iterations? (10:19:28 AM) *kamaji:* alex, dan: good points. (10:20:19 AM) *yarol1:* dan, you would have to kill the thread from outside at some time (10:20:29 AM) *daniel.g.petrie:* Alex: yes, I was agreeing with constant number of iterations, by pointing out that it is difficult to get a count with contant time (10:20:43 AM) *yarol1:* but not kill the OsServerTask (10:21:03 AM) *yarol1:* only the thread so you could read the number of iterations from it (10:21:05 AM) *Alexander.Chemeris:* 2Dan, Keith: ok (10:21:52 AM) *Alexander.Chemeris:* 2Jaro: It's true, question is solely about precision I believe (10:22:22 AM) *daniel.g.petrie:* I think we are in agreement: 1 or many threads doing a constant number of jobs/iterations such that the net time is significant (e.g. 1 minute) (10:22:36 AM) *Alexander.Chemeris:* *nod* (10:23:03 AM) *daniel.g.petrie:* shall we discuss what is done in the job? (10:23:04 AM) *kamaji:* 1minute for unit test to complete? or 1 minute for each stage of the test to complete (1 thread then many threads) (10:23:13 AM) *kamaji:* (i.e. 2 minutes total) (10:23:15 AM) *kamaji:* ? (10:23:34 AM) *Alexander.Chemeris:* To be more precise: We have N iterations and want to test M threads. Then we run N iterations in one thread and then N/M iterations in M threads. Right? (10:24:03 AM) *kamaji:* That is what I am under the assumption we have been talking about. (10:24:16 AM) *daniel.g.petrie:* yes, where M must be a factor of N (10:24:32 AM) *Alexander.Chemeris:* 2Keith: I think 10sec is enough, so 1 min is just an a example (10:24:39 AM) *yarol1:* no we have N jobs with M iterations and T threads, we send all jobs to the number of threads we have (1 job only sent to 1 thread), and count the time required (10:24:50 AM) *Alexander.Chemeris:* re: M must be a factor of N: sure (10:25:14 AM) *daniel.g.petrie:* I was thinking iterations and jobs were the same thing (10:25:26 AM) *yarol1:* k (10:25:45 AM) *daniel.g.petrie:* within a job we might do something iteratively to make it consume CPU though (10:26:11 AM) *Alexander.Chemeris:* aha, that is what I was thinking of too (10:26:42 AM) *daniel.g.petrie:* I think we only care about the total time for doing all the jobs right? (10:26:52 AM) *yarol1:* yes (10:26:59 AM) *kamaji:* That's the useful statistic we get out of this test yes. (10:27:57 AM) *daniel.g.petrie:* so the job should new/delete or malloc/free and yield anything else besides do some computations? (10:27:59 AM) *Alexander.Chemeris:* So, could we guess usefullness of threads agregation for low-intensive tasks from this test? (10:28:47 AM) *Alexander.Chemeris:* .. as I think we have much more low-intensive tasks and only few high-intensive ones (10:28:55 AM) *daniel.g.petrie:* Alex, I am not sure I understand your question (10:29:30 AM) *Alexander.Chemeris:* Almost all of our tasks are waiting for message almost all the time (it's low-intensive tasks) (10:29:33 AM) *daniel.g.petrie:* are you suggestion that most tasks are not that busy (low-intensive) (10:29:41 AM) *yarol1:* alexander, I think we need the maximum gain from reducing the number of threads, if the tasks are low intersive i think it will be lower (10:29:47 AM) *kamaji:* Dan: I think he's meaning that we have quite a few tasks that do small things, then sleep, do more small things, sleep. etc... (low intensive). Does this test whether having a lot of these impacts performance? (10:30:12 AM) *yarol1:* we want to know the worst case (10:30:46 AM) *daniel.g.petrie:* what is the worst case? (10:30:52 AM) *Alexander.Chemeris:* 2Keith: yes (10:30:53 AM) *kamaji:* Jaro: *nod* but is there a chance that the scheduler behavior is different with low intensive tasks? (10:31:32 AM) *yarol1:* I think it is, and that with low intensive tasks the gain from reducing the number of threads will be lower (10:32:19 AM) *Alexander.Chemeris:* What I'm thinking with low-intensive tasks, is that their performance would be greatly improved with agregation, as we will not need synchonize postMessage() between them then (10:32:38 AM) *kamaji:* Do we want to make this two tests then? one high-intensity, one low-intensity? (10:33:07 AM) *kamaji:* alex: I can see that being a gain. (10:33:47 AM) *Alexander.Chemeris:* 2Keith: "I can't"? (10:33:56 AM) *sergforce:* Do we going to use synchronization (I.e. mutexes, semaphores) between threads? Do we use different priority on threads in this test? (10:33:57 AM) *daniel.g.petrie:* I was just thinking on another track: we should separate the overhead of creating and deleting tasks into a different test, so we only measure context switching cost in this test (10:35:04 AM) *yarol1:* dan, thread creation wouldnt be measured in this test at all, since we would first start all tasks, and then post them some jobs (10:35:13 AM) *kamaji:* serg: not sure -- we don't want to make this test too complicated. (10:35:37 AM) *daniel.g.petrie:* that is what I was suggesting (10:37:27 AM) *daniel.g.petrie:* the synchonization operations might be interesting in a separate test. Lets focus on context switching cost for now (10:38:11 AM) *daniel.g.petrie:* should we do more than malloc, yield, compute to force context switching in a job? (10:38:28 AM) *kamaji:* so, as serg said - do we want to put in any thread synchronization into this? (10:38:39 AM) *Alexander.Chemeris:* 2Dan: I think synchronization is significant too, but agree that it would be separate test (10:38:50 AM) *kamaji:* Ok (10:39:12 AM) *kamaji:* So each thread would be working with data that it owns. (10:39:25 AM) *daniel.g.petrie:* righ (10:39:34 AM) *daniel.g.petrie:* right (10:39:34 AM) *kamaji:* Ok. (10:39:46 AM) *Alexander.Chemeris:* And I'm not sure is it possible to measure cost of synchronization with low-intensive tasks. (10:40:40 AM) *daniel.g.petrie:* we could always make the number of jobs less than the number of threads (10:41:09 AM) *Alexander.Chemeris:* 2Dan: And what then? (10:41:34 AM) *daniel.g.petrie:* to represent low-intense tasks (10:41:44 AM) *daniel.g.petrie:* or create big and small jobs (10:42:30 AM) *Alexander.Chemeris:* yeah, I thought about low-intensive tasks alone, without high-intensive ones... (10:42:38 AM) *daniel.g.petrie:* give some tasks a small job mostly to force a context switch (10:43:34 AM) *Alexander.Chemeris:* hum, I think we could measure low-intensive tasks gain with intensive message interaction between threads (10:43:48 AM) *Alexander.Chemeris:* threads = OsTasks (10:44:00 AM) *kamaji:* *nod* (10:44:15 AM) *Alexander.Chemeris:* A I said most impact of agragating them soul dbe in less synchronization needed (10:44:48 AM) *Alexander.Chemeris:* So, create several tasks and let them exchange with messages as fast as they could. (10:45:12 AM) *Alexander.Chemeris:* And then see how long it will take with separate threads and in one agregate thread (10:45:20 AM) *Alexander.Chemeris:* That is how I see it (10:45:41 AM) *Alexander.Chemeris:* s/soul dbe/should be/ (10:46:21 AM) *daniel.g.petrie:* we could create a queue in the main thread and have each thread post a I'm starting and I'm ending message (10:47:20 AM) *daniel.g.petrie:* for each job (10:47:31 AM) *Alexander.Chemeris:* 2Dan: And what then? (10:47:47 AM) *kamaji:* Alex: when you say messages - do you mean OsMsg in an OsMsgQ? (10:48:38 AM) *Alexander.Chemeris:* I meant working with OsServerTask::postMessage() (10:48:46 AM) *daniel.g.petrie:* the syncronization bottle neck would be on the queue. Isn't that what you were suggesting was a problem? (10:49:24 AM) *kamaji:* alex: ok. (10:50:07 AM) *Alexander.Chemeris:* 2Dan: I think it's a one part of a problem (10:50:52 AM) *daniel.g.petrie:* I am not sure we are converging on what a job does. I would like to finish up in the next 10 minutes. (10:51:01 AM) *Alexander.Chemeris:* .. ok, I think it is the whole problem (10:52:02 AM) *Alexander.Chemeris:* I think we should test several paterns: 1) mathematis only 2) memory (de)allocation and may be 3) synchronization (10:52:34 AM) *Alexander.Chemeris:* and 4) message exchange (10:52:38 AM) *kamaji:* for those 3 - we're talking about high-intensity tests, right? (10:52:46 AM) *Alexander.Chemeris:* *nod* (10:52:47 AM) *kamaji:* and #4 would be low-intensity? (10:53:17 AM) *Alexander.Chemeris:* and 4) is for low-intensity, tested with high-message exchange rate (10:53:24 AM) *kamaji:* and for each of 1-4, we'd do two runs - one with small number of threads (1?), one with many threads. (10:53:33 AM) *daniel.g.petrie:* the net needs to be high intensity or we do not have meaningful timing tests (10:53:40 AM) *Alexander.Chemeris:* at 4) tasks do nothing, only wait for message (10:54:04 AM) *Alexander.Chemeris:* 2Dan sure (10:54:14 AM) *kamaji:* dan: "the net" - what do you mean? (10:54:24 AM) *kamaji:* dan: the net result? (10:55:07 AM) *daniel.g.petrie:* the total system view must be intense compute, resource usage or the time is meaningless (10:55:47 AM) *Alexander.Chemeris:* 2Dan: *nod* in 4) I propose to play something like "ping-pong" with messages between tasks (10:56:10 AM) *daniel.g.petrie:* play hot potato with the message (10:56:12 AM) *kamaji:* dan: alex's approach for #4 would be compute intense -- messages back/forth as fast as they can. (10:56:23 AM) *kamaji:* *nod* (10:57:23 AM) *kamaji:* by low intensity, we mean that within each context (before each context switch), only little work gets done (i.e. send out message) (10:57:24 AM) *daniel.g.petrie:* ok, so who is going to do this? (10:57:25 AM) *Alexander.Chemeris:* So, are we in agreement? Jaroslav? (10:59:54 AM) *yarol1:* sorry I'm solving a problem at my work (11:00:42 AM) *yarol1:* just dont spend too much time on it, maybe the gain will be 5% or less (11:01:01 AM) *yarol1:* if its more, then more complicated tests can be made (11:02:25 AM) *Alexander.Chemeris:* So, are you going to write some test? (11:02:52 AM) *daniel.g.petrie:* yeah, it is better to take one step at a time and not commit a lot of work before we know the benefit. That is really the point here to do the minimum to see if it is worth making our system more complicated for this optimization. (11:05:18 AM) *daniel.g.petrie:* Jaroslav can you take a try at implementing the first step in these tests? (11:15:45 AM) *yarol1:* we have a problem with php at work, I'm busy fixing it (11:16:25 AM) *Alexander.Chemeris:* Is it a problem for right now, so need some time to think, or will not have time for this at all? (11:16:41 AM) *Alexander.Chemeris:* *.. so you need some time.. (11:17:18 AM) *kamaji:* Is this something we can put on the back burner for the moment? as it seems all of us are in the midst of projects that can't be put off. (11:17:28 AM) *kamaji:* We've got a good plan of attack. (11:18:11 AM) *kamaji:* We can make sure that this conversation makes it on the mailing list, then others can know the plan... maybe someone will pick it up off the mailing list? (11:18:17 AM) *Alexander.Chemeris:* I think some of us should write down this idea and put ir on the web (11:18:32 AM) *kamaji:* *nod* (11:19:51 AM) *Alexander.Chemeris:* Keith, Dan I believe you'll do this much faster then Serg and I.. Could you? (11:20:07 AM) *Alexander.Chemeris:* We'll participate editing it :) (11:20:47 AM) *Alexander.Chemeris:* I believe we need a wiki for this and other ideas and plans and so on.. Could we use Calvia wiki for now? (11:20:48 AM) *kamaji:* This is, recording the plan of attack? (11:20:56 AM) *Alexander.Chemeris:* *nod* (11:21:47 AM) *kamaji:* we could make a JIRA entry for it.. (11:22:39 AM) *Alexander.Chemeris:* It does not have a good formating and is hard to do in-text edits. But putting a link to it in JIRA may be good think (12:21:56 PM) *yarol1 left the room.* (12:41:47 PM) *daniel.g.petrie left the room.* -- Keith Kyzivat SIPez LLC. SIP VoIP, IM and Presence Consulting http://www.SIPez.com tel: +1 (617) 273-4000
_______________________________________________ sipxtapi-dev mailing list [email protected] List Archive: http://list.sipfoundry.org/archive/sipxtapi-dev/
