Hi Costin,

now here is my patch. It is very small and it works. And we don't need 
additional config flags. When the lb_value is read from the config file it is 
checked against zero. With this a flag in lb_worker is set so that the 
get_max_lb-function could decide if this worker should be used or not. When you 
set lb_value of the main or local worker to 1 and 0 for the others all works 
fine. When you switch off the main worker you will be routed to the first of the 
other worker. Thats not very balancing, but the load balancer in front of the 
cluster shouldn't send requestes to a node with a shutdowned tomcat. It is only 
for requests with sessions on this node and for the time between shutdown of 
tomcat and the recognition of this by the balancer.

When tomcat is up again it will take a little time, in maximum the value of 
WAIT_BEFORE_RECOVER and the worker will be choosen and because of the flag it 
wouldn't get _inf_ as his lb_value.

the patch was created by
cvs diff -u jk_lb_worker.c

Bernd

Bernd Koecke wrote:
> Hi Costin,
> 
> May be I checked out the wrong repository. I checked out 
> jakarta-tomcat-connectors with the 
> CVSROOT=:pserver:[EMAIL PROTECTED]:/home/cvspublic
> 
> Now to the details, see below.
> 
> [EMAIL PROTECTED] wrote:
> 
>> On Thu, 2 May 2002, Bernd Koecke wrote:
>>
>>
>>> misunderstood it. After you said that my patch is included a had a 
>>> closer look at mod_jk. I can't see anything of my code but I found 
>>> the special meaning of the zero lb_factor/lb_value. It seems that I 
>>> didn't understand it right at the first time. This could solve my 
>>> problem but after a closer look and some testing I found another 
>>> problem. When you set the lb_value in workers.properties to 1 for the 
>>> local tomcat and 0 for the others, you get the desired behavior. But 
>>> if you switch off the local tomcat for a short time you come into 
>>> trouble. The problem is the 0 for the other workers. The calculation 
>>> of lb_worker transforms the 0 to _inf_. Because 1/0 for a double is 
>>> _inf_. This is greater than any 
>>
>>
>>
>> I think there is a piece that checks for 0 and sets it to 
>> DEFAULT_VALUE (==1 ) before doing 1/lb. 
> 
> 
> No, I think not :). I checked it yesterday. With some additional log 
> statements in the validate function of jk_lb_worker.c you get the value 
> _inf_ for the lb_factor and lb_value (line 434-444). Because if it would 
> be set to 1, my config hadn't worked. Because I set the local worker to 
> 1 and the others to 0.
> 
>>
>> While looking at the code - I'm not very sure this whole float is needed,
>> I'll try to find a way to simplify it and use ints ( maybe 0..100 with 
>> some 'special' values for NEVER and ALLWAYS, or some additional flags ).
>>
> 
> This is possible, but then you must add a check if the value is 0. 
> Because without it you calc 1/0 with an int and this will give you an 
> error.
> 
>> But the way it works ( or at least how I understand it ) is that if 
>> the main worker fails, then we look at all workers in error state and 
>> try the one with the oldest error. And the 'main' worker will be tried 
>> again when the timeout expires.
>>
> 
> Thats not the whole story. Its right you will check the main worker when 
> its back again and use it only once. Because when the request was 
> successful handled rec->in_recovering is true (line 332 of 
> jk_lb_worker.c, service function). Than get_max_lb get the value _inf_ 
> from one of the other worker. Than the things happen which I said in my 
> prior mail.
> 
>> I haven't tested this too much, I just applied the patches ( that I 
>> understand :-), I'll add some more debugging for this process and 
>> maybe we can find a better solution.
>>
>> But this functionality is essential for the JNI worker and very important
>> in general - so I really want to find the best solution. If you have any
>> patch idea, let me know.
>>
>> To avoid further confusion and complexity in the lb-factor/value, I 
>> think we should add one more flag ( 'local_worker' ? ) and use it 
>> explicitely. Again, patches are wellcome - it's allways good to have 
>>  different ( and more ) eyes looking at the code.
> 
> 
> That was it what I did in my sent patch, the additional documentation 
> was sent a few days later. But my additions to the lb_worker were a 
> little bit to complex. You are right we should get it when we use the 
> flag only on the main worker and change the behavior after a failure for 
> this worker. But we need the trick with 0/inf for the other worker, 
> because only with this we have the situation that the other worker 
> wouldn't be asked when there is no session and the main worker is up.
> 
> I will try to build another patch and send it. I think it could be 
> possible without an additional flag.
> 
> Another tought about this:
> When you use double and we fix the handling after an error, the main 
> worker would never reach _inf_. Because the lb_factor is < 1 if lb_value 
> wasn't 0. After choosing the worker this value is added to the lb_value. 
> But with a high value for lb_value the differenc between two savable 
> double numbers is greater than the lb_factor. But this is only 
> interessting in theory. I think in real world we will reboot apache 
> before this will happen :).
> 
> 
> Bernd
> 
>> ( that can go in both jk1, but I can't see a release of jk2 without 
>> this functionality )
>>
>> Costin
>>
>>
>>
>>> other lb_value and greater than the lb_value of the local tomcat. But 
>>> after a failure of the local tomcat he is in error_state. After some 
>>> time its set to recovering and if the local tomcat is back again the 
>>> function jk(2)_get_max_lb gets the highest lb_value. This is _inf_ 
>>> from one of the other workers. The addition of a value to _inf_ is 
>>> meaningless. You end up with an lb_value of _inf_ for the local 
>>> worker. If this worker isn't the first in the worker list, it will 
>>> never be choosen again. Because his lb_value will never be less than 
>>> another lb_value, because all the other workers have _inf_ as theire 
>>> lb_values. So every request without a session will be routed to the 
>>> first of the other tomcats.
>>>
>>> The only way out is a restart of the local apache after tomcat is up 
>>> and running. But I don't know when tomcat is finished with all his 
>>> contexts and started the connectors.
>>>
>>> I didn't looked very deep into jk2, but I found the same 
>>> get_most_suitable_worker and get_max_lb functions. The jk2_get_max_lb 
>>> function will always return _inf_. In your answer to some other mails 
>>> you said, that workers could be removed. Do I understand it right, 
>>> that if my local tomcat goes down his worker is removed from the list 
>>> and after he is comming up again added to the worker list with 
>>> reseted lb_value (only for mod_jk2)?
>>>
>>> The next days I will look in the docu and code of jk2 and give it a 
>>> try. May be all my problems gone away with the new module :).
>>>
>>> Sorry if I ask stupid questions, but I want to make it working for 
>>> our new cluster.
>>>
>>> Thanks
>>>
>>> Bernd
>>>
>>>
>>>> This is essential for jk2's JNI worker, which fits perfectly this case
>>>> ( you don't want to send via TCP when you have a tomcat instance in 
>>>> the same process ).
>>>>
>>>>
>>>>
>>>>
>>>>> (2) Tomcat instances in standby or "soft shutdown" mode where they 
>>>>> serve
>>>>> requests bound by established sessions, and requests without a 
>>>>> session only
>>>>> if all non-standby instances have failed.
>>>>
>>>>
>>>>
>>>> That's what the SHM scoreboard is going to do ( among other things 
>>>> ). You can register tomcat instances ( which will be added 
>>>> automatically ),
>>>> or unregister - in which case no new requests ( except the old 
>>>> sessions )
>>>> will go to the unregistered tomcat.
>>>>
>>>>
>>>> Costin
>>>>
>>>>
>>>>


[...]



-- 
Dipl.-Inform. Bernd Koecke
UNIX-Entwicklung
Schlund+Partner AG
Fon: +49-721-91374-0
E-Mail: [EMAIL PROTECTED]
Index: jk_lb_worker.c
===================================================================
RCS file: /home/cvspublic/jakarta-tomcat-connectors/jk/native/common/jk_lb_worker.c,v
retrieving revision 1.8
diff -u -r1.8 jk_lb_worker.c
--- jk_lb_worker.c      12 Jan 2002 05:27:39 -0000      1.8
+++ jk_lb_worker.c      3 May 2002 08:15:25 -0000
@@ -87,6 +87,7 @@
     int     in_error_state;
     int     in_recovering;
     time_t  error_time;
+    int     use_for_max_lb;
     jk_worker_t *w;
 };
 typedef struct worker_record worker_record_t;
@@ -234,7 +235,7 @@
     double rc = 0.0;    
 
     for(i = 0 ; i < p->num_of_workers ; i++) {
-        if(!p->lb_workers[i].in_error_state) {
+        if(!p->lb_workers[i].in_error_state && p->lb_workers[i].use_for_max_lb) {
             if(p->lb_workers[i].lb_value > rc) {
                 rc = p->lb_workers[i].lb_value;
             }
@@ -433,6 +434,11 @@
                 p->lb_workers[i].name = jk_pool_strdup(&p->p, worker_names[i]);
                 p->lb_workers[i].lb_factor = jk_get_lb_factor(props, 
                                                                worker_names[i]);
+                if (p->lb_workers[i].lb_factor == 0) {
+                    p->lb_workers[i].use_for_max_lb = JK_FALSE;
+                } else {
+                    p->lb_workers[i].use_for_max_lb = JK_TRUE;
+                }
                 p->lb_workers[i].lb_factor = 1/p->lb_workers[i].lb_factor;
                 /* 
                  * Allow using lb in fault-tolerant mode.

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to