Howdy team, 

 We in GSS are currently working a sev1 case from Comcast, which
customer are hitting the 503 HTTP error pages when trying to access the
system tab on the webUI for Satellite 5.4. All the other functions (yum,
create channel, etc) works as expected. 

  [Fri May 20 14:30:00 2011] [error] (111)Connection refused: proxy:
AJP: attempt to connect to 127.0.0.1:8009 (*) failed
  [Fri May 20 14:30:00 2011] [error] proxy: AJP: failed to make
connection to backend: localhost
  [Fri May 20 14:30:04 2011] [error] (111)Connection refused: proxy:
AJP: attempt to connect to 127.0.0.1:8009 (*) failed
  [Fri May 20 14:30:04 2011] [error] proxy: AJP: failed to make
connection to backend: localhost


 Customer environment: 

   * External Database Oracle 11
   * RHN Satellite 5.4 

   * Issue: when using webUI and clicking System tab, customer receive
an 503 HTTPD error. The other tabs works (little slowly), but no 503's
errors. 


Diagnostics Steps: 

    Database query times 
       In order to check if the bottleneck was the external DB, we ran
the SQL manually and the SQL ran pretty quickly. 

-- Show systems (rhn/systems/Overview.do)
SELECT  DISTINCT S.id,
                    S.name,
                   (SELECT 1
                      FROM rhnServerFeaturesView SFV
                     WHERE SFV.server_id = S.id
                       AND SFV.label = 'ftr_system_grouping') AS
selectable
     FROM  rhnServer S inner join  rhnUserServerPerms USP on S.id =
USP.server_id
    WHERE  USP.user_id = &rhnuser_id;

-- Show systems in Group (rhn/systems/Overview.do?showgroups=true)
SELECT SGM.server_id AS ID, S.name AS NAME,
            (SELECT 1
            FROM rhnServerFeaturesView SFV
           WHERE SFV.server_id = S.id
             AND SFV.label = 'ftr_system_grouping') AS selectable
    FROM rhnServer S, rhnServerGroupMembers SGM
   WHERE SGM.server_group_id = &rhnServerGroup_id 
     AND SGM.server_id = S.id
     AND EXISTS (SELECT 1 FROM rhnServerFeaturesView SFV
                 WHERE SFV.server_id = S.id
                   AND SFV.label = 'ftr_system_grouping')
  ORDER BY UPPER(NVL(S.NAME, '(none)')), S.ID;

The queries seems to be ok. 

{SNIP}
...

1000018206 espmon-po-1p.cable.comcast.com
1                                                        
1000018235 ocepcui-wc-1p.sys.comcast.net
1                                                        
 
2477 rows selected.
 
Elapsed: 00:00:11.14


{SNIP}
1000016310 xg3
1                                                        
1000013457 xtaweb-nb-01p.philadelphia.pa.bo.comcast.net
1                                                        

2377 rows selected.

Elapsed: 00:00:10.65


    Customer have +- 3800 servers registered in Satellite

SQL> select count(*) from rhnserver;

  COUNT(*)
----------
      3748

  
  We asked the DB dump from customer, and we imported it on internal
reproducer. 
 
      Hostname: dhcp12.gsslab.rdu.redhat.com
      SSH: root/redhat
      webUI: satadmin/redhat

  Using the customer db, we **COULD NOT** reproduce the issue directly.
To load the system tab, at the first time, it took 1-2 minutes to return
at the first access. Afterwards, it took almost 50s. 

  To reproduce the issue in-house, we force the timeout to a very low
value, then we got the 503 + ajp timeout error. 

/etc/httpd/conf/httpd.conf
  
  From:
     Timeout 120
  To:
     Timeout 10


  /etc/httpd/conf.d/zz-spacewalk-www.conf

   From:
      
     <IfModule proxy_ajp_module>
       RewriteRule ^/rhn(.*) ajp://localhost:8009/rhn$1 [P]
       RewriteRule ^(/.*\.(do|jsp)(\?.*)?)$ ajp://localhost:8009/$1 [P]
     </IfModule>

   To:

     <IfModule proxy_ajp_module>
       RewriteRule ^/rhn(.*) ajp://localhost:8009/rhn$1 [P] timeout=10
       RewriteRule ^(/.*\.(do|jsp)(\?.*)?)$ ajp://localhost:8009/$1 [P]
timeout=10
     </IfModule>

  Afterwards, we restarted Satellite. 

[root@dhcp12 conf.d]# tail -f /var/log/httpd/error_log
[Tue May 24 11:58:19 2011] [notice] Digest: done
[Tue May 24 11:58:19 2011] [notice] mod_python: Creating 4 session
mutexes based on 256 max processes and 0 max threads.
[Tue May 24 11:58:19 2011] [notice] Apache configured -- resuming normal
operations
[Tue May 24 11:58:23 2011] [error] (111)Connection refused: proxy: AJP:
attempt to connect to 127.0.0.1:8009 (*) failed
[Tue May 24 11:58:23 2011] [error] proxy: AJP: failed to make connection
to backend: localhost
[Tue May 24 11:58:27 2011] [error] (111)Connection refused: proxy: AJP:
attempt to connect to 127.0.0.1:8009 (*) failed
[Tue May 24 11:58:27 2011] [error] proxy: AJP: failed to make connection
to backend: localhost
[Tue May 24 11:58:50 2011] [error] (70007)The timeout specified has
expired: ajp_ilink_receive() can't receive header
[Tue May 24 11:59:37 2011] [error] (70007)The timeout specified has
expired: ajp_ilink_receive() can't receive header
[Tue May 24 12:00:25 2011] [error] (70007)The timeout specified has
expired: ajp_ilink_receive() can't receive header


[root@dhcp12 conf.d]# tail -f /var/log/httpd/ssl_access_log
10.11.9.75 - - [24/May/2011:12:09:37 -0400]
"GET /rhn/software/channels/All.do HTTP/1.1" 200 137344
10.11.9.75 - - [24/May/2011:12:09:38 -0400] "GET /rhn/dwr/engine.js
HTTP/1.1" 200 46055
10.11.9.75 - - [24/May/2011:12:09:43 -0400]
"GET /rhn/systems/Overview.do HTTP/1.1" 503 402


  I'm running without options on this case. Do you guys have some clues
from what we can do to identify/debug the issue? Why the systems tab
take to long to return, if the SQL return pretty quickly. 
 
 Thanks for your attention. 

Cheers, 
--marcelo

-- 
Marcelo Moreira de Mello
RHCA RHCSS RHCVA 
Software Maintenance Engineer/SEG           

gpg id: 2048R/FDB110E5
gpg fingerprint: 3BE7 EF71 4DD7 6812 D309  8F18 BD42 D095 FDB1 10E5

_______________________________________________
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel

Reply via email to