Re: PROPOSAL: mod_jk2: Group/Instance

Bernd Koecke Fri, 03 May 2002 00:29:51 -0700

Hi Costin,

May be I checked out the wrong repository. I checked out 
jakarta-tomcat-connectors with the 
CVSROOT=:pserver:[EMAIL PROTECTED]:/home/cvspublic


Now to the details, see below.

[EMAIL PROTECTED] wrote:
> On Thu, 2 May 2002, Bernd Koecke wrote:
> 
> 
>>misunderstood it. After you said that my patch is included a had a closer look 
>>at mod_jk. I can't see anything of my code but I found the special meaning of 
>>the zero lb_factor/lb_value. It seems that I didn't understand it right at the 
>>first time. This could solve my problem but after a closer look and some testing 
>>I found another problem. When you set the lb_value in workers.properties to 1 
>>for the local tomcat and 0 for the others, you get the desired behavior. But if 
>>you switch off the local tomcat for a short time you come into trouble. The 
>>problem is the 0 for the other workers. The calculation of lb_worker transforms 
>>the 0 to _inf_. Because 1/0 for a double is _inf_. This is greater than any 
> 
> 
> I think there is a piece that checks for 0 and sets it to DEFAULT_VALUE 
> (==1 ) before doing 1/lb. 

No, I think not :). I checked it yesterday. With some additional log statements 
in the validate function of jk_lb_worker.c you get the value _inf_ for the 
lb_factor and lb_value (line 434-444). Because if it would be set to 1, my 
config hadn't worked. Because I set the local worker to 1 and the others to 0.

> 
> While looking at the code - I'm not very sure this whole float is needed,
> I'll try to find a way to simplify it and use ints ( maybe 0..100 with 
> some 'special' values for NEVER and ALLWAYS, or some additional flags ).
> 

This is possible, but then you must add a check if the value is 0. Because 
without it you calc 1/0 with an int and this will give you an error.

> But the way it works ( or at least how I understand it ) is that if the 
> main worker fails, then we look at all workers in error state and try the 
> one with the oldest error. And the 'main' worker will be tried again when 
> the timeout expires.
> 

Thats not the whole story. Its right you will check the main worker when its 
back again and use it only once. Because when the request was successful handled 
rec->in_recovering is true (line 332 of jk_lb_worker.c, service function). Than 
get_max_lb get the value _inf_ from one of the other worker. Than the things 
happen which I said in my prior mail.

> I haven't tested this too much, I just applied the patches ( that I 
> understand :-), I'll add some more debugging for this process and maybe 
> we can find a better solution.
> 
> But this functionality is essential for the JNI worker and very important
> in general - so I really want to find the best solution. If you have any
> patch idea, let me know.
> 
> To avoid further confusion and complexity in the lb-factor/value, I 
> think we should add one more flag ( 'local_worker' ? ) and use it 
> explicitely. Again, patches are wellcome - it's allways good to have 
>  different ( and more ) eyes looking at the code. 
> 

That was it what I did in my sent patch, the additional documentation was sent a 
few days later. But my additions to the lb_worker were a little bit to complex. 
You are right we should get it when we use the flag only on the main worker and 
change the behavior after a failure for this worker. But we need the trick with 
0/inf for the other worker, because only with this we have the situation that 
the other worker wouldn't be asked when there is no session and the main worker 
is up.

I will try to build another patch and send it. I think it could be possible 
without an additional flag.

Another tought about this:
When you use double and we fix the handling after an error, the main worker 
would never reach _inf_. Because the lb_factor is < 1 if lb_value wasn't 0. 
After choosing the worker this value is added to the lb_value. But with a high 
value for lb_value the differenc between two savable double numbers is greater 
than the lb_factor. But this is only interessting in theory. I think in real 
world we will reboot apache before this will happen :).


Bernd

> ( that can go in both jk1, but I can't see a release of jk2 without this 
> functionality )
> 
> Costin
> 
> 
> 
>>other lb_value and greater than the lb_value of the local tomcat. But after a 
>>failure of the local tomcat he is in error_state. After some time its set to 
>>recovering and if the local tomcat is back again the function jk(2)_get_max_lb 
>>gets the highest lb_value. This is _inf_ from one of the other workers. The 
>>addition of a value to _inf_ is meaningless. You end up with an lb_value of 
>>_inf_ for the local worker. If this worker isn't the first in the worker list, 
>>it will never be choosen again. Because his lb_value will never be less than 
>>another lb_value, because all the other workers have _inf_ as theire lb_values. 
>>So every request without a session will be routed to the first of the other 
>>tomcats.
>>
>>The only way out is a restart of the local apache after tomcat is up and 
>>running. But I don't know when tomcat is finished with all his contexts and 
>>started the connectors.
>>
>>I didn't looked very deep into jk2, but I found the same 
>>get_most_suitable_worker and get_max_lb functions. The jk2_get_max_lb function 
>>will always return _inf_. In your answer to some other mails you said, that 
>>workers could be removed. Do I understand it right, that if my local tomcat goes 
>>down his worker is removed from the list and after he is comming up again added 
>>to the worker list with reseted lb_value (only for mod_jk2)?
>>
>>The next days I will look in the docu and code of jk2 and give it a try. May be 
>>all my problems gone away with the new module :).
>>
>>Sorry if I ask stupid questions, but I want to make it working for our new cluster.
>>
>>Thanks
>>
>>Bernd
>>
>>
>>>This is essential for jk2's JNI worker, which fits perfectly this case
>>>( you don't want to send via TCP when you have a tomcat instance in the 
>>>same process ).
>>>
>>>
>>>
>>>
>>>>(2) Tomcat instances in standby or "soft shutdown" mode where they serve
>>>>requests bound by established sessions, and requests without a session only
>>>>if all non-standby instances have failed.
>>>
>>>
>>>That's what the SHM scoreboard is going to do ( among other things ). 
>>>You can register tomcat instances ( which will be added automatically ),
>>>or unregister - in which case no new requests ( except the old sessions )
>>>will go to the unregistered tomcat.
>>>
>>>
>>>Costin
>>>
>>>
>>>
>>>>[EMAIL PROTECTED] wrote:
>>>>
>>>>
>>>>
>>>>>On Tue, 30 Apr 2002, Bernd Koecke wrote:
>>>>>
>>>>>
>>>>>
>>>>>>some weeks ago I send a patch for mod_jk for an only routing lb_worker. A
>>>>>
>>>>few 
>>>>
>>>>
>>>>>>days later I sent the docu. Henry Gomez said, that it should be commited.
>>>>>
>>>>But it 
>>>>
>>>>
>>>>>>I think it isn't in the repository. But its the same  with me here, to
>>>>>
>>>>mutch 
>>>>
>>>>
>>>>>>work for to less time :).
>>>>>
>>>>>I think it is in mod_jk, I remember seeing the commit. 
>>>>>
>>>>>And I think I commited it in jk2 as well ( after some modifications ).
>>>>>
>>>>>
>>>>>
>>>>>>I need sticky sessions but no loadbalancing in the module. If a request
>>>>>
>>>>without 
>>>>
>>>>
>>>>>>a session comes in, it should be routed to the _local_ tomcat.
>>>>>
>>>>>Well, there is another use-case with the exact same behavior - Apache2 
>>>>>with tomcat in JNI mode. All requests without session should be routed to 
>>>>>the _jni_ channel ( i.e. in-process, minimal overhead ).
>>>>>
>>>>>It's exacly the same - so be sure I do my best to handle this case :-)
>>>>>
>>>>>Apache2 acts like a 'natural' load-balancer/fail-over, with the parent
>>>>>process monitoring for crashes and it starts/stop childs based on 
>>>>>load.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I think this could be possible with the associated instance of a channel
>>>>>
>>>>(item 
>>>>
>>>>
>>>>>>7). Then I have to configure all four nodes for the same group. Because
>>>>>
>>>>all 
>>>>
>>>>
>>>>>>nodes will serve the same webapps and associate the channel with this
>>>>>
>>>>group. But 
>>>>
>>>>
>>>>>>for this I need a non balancing group. I don't see if the default
>>>>>
>>>>behavior of a 
>>>>
>>>>
>>>>>>group is balancing and if this can be switched off. Is this right or do I
>>>>>
>>>>miss 
>>>>
>>>>
>>>>>>something?
>>>>>
>>>>>The default is balancing, but you can tune this using weithgs ( and I 
>>>>>think we use your code for making one instance 'top priority').
>>>>>
>>>>>Please check the code, take a look and send additional comments/patches.
>>>>>
>>>>>It's not yet completely done, of course.
>>>>>
>>>>>
>>>>>Thanks,
>>>>>Costin 
>>>>
>>>>
>>>>--
>>>>To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
>>>>For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>>>>
>>>>--
>>>>To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
>>>>For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>>>>
>>>>
>>>
>>>--
>>>To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
>>>For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>>>
>>
>>
>>
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> 



-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: [EMAIL PROTECTED]


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: PROPOSAL: mod_jk2: Group/Instance

Reply via email to