Thanks Walt. I'm forwarding your response to dev so that everyone can benefit. :-)

-sam

Begin forwarded message:

From: "Walter B. Ligon III" <[EMAIL PROTECTED]>
Date: October 24, 2006 2:52:45 PM CDT
To: Sam Lang <[EMAIL PROTECTED]>
Subject: Re: [Pvfs2-developers] threaded client-core and the device thread

Well, I had been planning to write that up ... as soon as I get the code done, but the pvfs-client stuff is being a real bitch. I don't really understand most of it any more than you understand this other stuff.

But the quick answer is SM_ACTION_DEFERRED is just 0 and SM_ACTION_COMPLETE is just 1, you use them the same way 0 and 1 were used, only now (hopfully) it is a little clearer what is going on (either the state action completed and you can continue to the next state, or it it was deferred, and you have to wait for it to finish).

There is a new return code SM_ACTION_TERMINATE which indicates that the state machine should terminate. The state machines themselves treat the various "frames" (PINT_client_sm or PINT_server_op) as opaque types. They are set up by the respective code (client or server) and dutifully returned when requested in the state actions. Really, none of that is changed from the original, except how you get to them. They are nolonger directly accessible but should be accessed with PINT_sm_frame function.

I don't have a general method to kill a state machine. I guess I can put that on the list for the next revision, but at this point I'm still focussing on getting what is there to run. The unexpected message state machine in the server checks for a server specific flag and kills a state machine if it is set. Its really up to the jobs to deal with killing a deferred SM - and I don't know if or how to do that. If there is a way to cancel a job, and if we keep a reference to a job for a deferred SM, then we could kill it that way. But if its just the timer SMs we could always have them check a flag each time the timer goes off and die if the flag is set - similar to what the unexpected messages do.

Oh, and the other big change is that unepxected messages are nolonger special cases - they are regular old SMs like everything else. Much cleaner.

Walt

Sam Lang wrote:
On Oct 24, 2006, at 1:53 PM, Walter B. Ligon III wrote:
Good. I'm making progress tracking down the problems in the code - somehow a bunch of edits got lost. I'm fixing them. Involves changes it all of the client state machines.


BTW, there is one I'm confused about. src/client/sysint/sys- getattr.sm the last state action "getattr_set_sys_response" returns from several places. It is not clear if ALL of them intend to terminate since they don't all set the op_completed flag, but the only option in the SM is to terminate. So I'm assuming they want to terminate. If you know anything about that one I'd appreciate it if you'd look.
I agree that you don't want SM_ACTION_DEFERRED for any of those. It looks like you just went through and replaced all the return 0; lines in state actions with SM_ACTION_DEFERRED, even if the error_code is set to a negative value (we used to ignore the return value if the error value was negative?). If it was just a search and replace, there are probably a bunch of other places like this as well. BTW, when is SM_ACTION_COMPLETE supposed to be used (returned by a state action)? For nested machines? We could really use some documentation for what is supposed to be returned by state actions and when. It didn't exist before, and it took me a while to figure out how return 0; and return 1; behaved, and now all that is changing again. Its certainly for the better, but it will help me to have the rules documented explicitly. Also, the semantics of state machines and jobs, what are they? What are the jobs currently associated with a state machine pointer (PINT_client_sm or PINT_server_op)? How do I stop/cancel a state machine? This is especially pertinent for our state machines that essentially infinite loop, such as the job-timer sm. We don't currently cleanup those state machines ourselves, it would be nice of us if we did. That means figuring out what (if any) jobs are currently posted by the machine, and cancelling or waiting for completion on those jobs.
-sam

Walt

Sam Lang wrote:

I'm working with your branch Walt. Most of the code that does allocation of the client state machines is the same.
-sam
On Oct 24, 2006, at 9:10 AM, Walter B. Ligon III wrote:

Should be careful here, since all of the code dealing with PINT_client_sm's have been rewritten for the new SM code and Murali's suggestions (for example) may not work so well.

Walt

Murali Vilayannur wrote:

Hey Sam,

I ran pvfs2-client-core in valgrind, and then ran Bonnie++ a few times (10) on the mounted pvfs volume, and noticed the following when I stopped the client process:

==20132== malloc/free: 1,298,824 allocs, 1,297,888 frees, 3,462,517,583 bytes allocated.

Allocating and freeing 3.5GB seemed extreme, so I went exploring. It turns out that every time we allocate a PINT_client_sm, we're allocating about 35KB:

(gdb) p sizeof(struct PINT_client_sm)
$4 = 37764


Oh boy.. that is definitely large..

static array of 8 PINT_client_lookup_sm_ctx, which itself has a static array 40 PINT_client_lookup_sm_segment, which are each about 112 bytes. Anyway, it ends up accumulating.

So I'm convinced at this point that this is beyond the noise range, plus its just cruft that we don't need. I'd like to swap out those static arrays for dynamic allocation when we get to the start of the lookup state machine. Any thoughts or suggestions?


I agree. It definitely does not look like noise region anymore.
How about we keep a pool of PINT_client_sm's around in client- core and allocate from that instead of dynamically allocating one everytime?
My 2 cents :)
thanks,
Murali
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2- developers



--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University


--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University


--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to