Def. Quota Jan Hutař <jhu...@redhat.com>:

On Tue, 04 Jun 2013 23:41:12 +0200 Benedetto Vassallo
<benedetto.vassa...@unipa.it> wrote:


Def. Quota Jan Hutař <jhu...@redhat.com>:

> On Sat, 01 Jun 2013 14:56:53 +0200 Benedetto Vassallo
> <benedetto.vassa...@unipa.it> wrote:
>
>>
>> Def. Quota Benedetto Vassallo <benedetto.vassa...@unipa.it>:
>>
>> > Def. Quota Benedetto Vassallo
>> > <benedetto.vassa...@unipa.it>:
>> >
>> >> Def. Quota Michael Mraka <michael.mr...@redhat.com>:
>> >>
>> >>> Benedetto Vassallo wrote:
>> >>> %
>> >>> % Hi all,
>> >>> % I have an issue on spacewalk notifications.
>> >>> % Starting 2 weeks ago, my spacewalk server don't send
>> >>> any % notification e-mail regarding monitoring probes,
>> >>> and I find the % following message in
>> >>> my /var/log/nocpulse/NotifEscalator-error.log % file:
>> >>> %
>> >>> % 2013-05-06 10:29:25  7052 Unable to acquire lock on
>> >>> % /var/tmp/01_1367828964_007780_001 /usr/bin/notifier-7115
>> >>> at
>> >>> % /usr/share/perl5/vendor_perl/NOCpulse/Notif/AlertFile.pm
>> >>> line 111. % % Yesterday I upgraded my spacewalk
>> >>> installation to 1.9 but nothing is changed.
>> >>> %
>> >>> % What I can do to solve this issue?
>> >>>
>> >>> Hi Benedetto,
>> >>>
>> >>> Does /var/tmp/01_1367828964_007780_001 file exists?
>> >>> What are its permissions?
>> >>> Are there any relevant selinux errors?
>> >>>
>> >>> Regards,
>> >>>
>> >>> --
>> >>> Michael Mráka
>> >>> Satellite Engineering, Red Hat
>> >>>
>> >>> _______________________________________________
>> >>> Spacewalk-list mailing list
>> >>> Spacewalk-list@redhat.com
>> >>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>> >>
>> >> Hi Michael,
>> >>
>> >> As I wrote in my previous e-mail, all files in /var/tmp
>> >> exists.
>> >>
>> >> Their permissions are:
>> >>
>> >> -rw-rw-r--  1 nocpulse nocpulse  3396 14 mag 16:52
>> >> 01_1367828964_007780_001
>> >>
>> >> Selinux is disabled.
>> >>
>> >> Today I tried to reinstall spacewalk from zero, using the
>> >> same database (skipping database polupation when
>> >> executing spacewalk-setup --disconnected) and I have the
>> >> same situation.
>> >>
>> >> Can you help me, please?
>> >>
>> >> Thank you.
>> >>
>> >>
>> >> --
>> >> Benedetto Vassallo
>> >> Sistema Informativo di Ateneo
>> >> Settore Gestione Reti Hardware e Software
>> >> U.O.B. Sviluppo e manutenzione dei sistemi
>> >> Università degli studi di Palermo
>> >>
>> >> Phone: +3909123860056
>> >> Fax: +390916529124
>> >>
>> >>
>> >> _______________________________________________
>> >> Spacewalk-list mailing list
>> >> Spacewalk-list@redhat.com
>> >> https://www.redhat.com/mailman/listinfo/spacewalk-list
>> >
>> > Hi all,
>> > Can anyone help me to solve this issue?
>> > What I have to investigate to understand where is the
>> > problem? Maybe I miss some perl module?
>> > The strange thing is it has stopped working from one day
>> > to another. All my 829 probes are still working, just
>> > spacewalk don't send e-mails Please help me.
>> > Thank you.
>> >
>> >
>> >
>> > --
>> > Benedetto Vassallo
>> > Sistema Informativo di Ateneo
>> > Settore Gestione Reti Hardware e Software
>> > U.O.B. Sviluppo e manutenzione dei sistemi
>> > Università degli studi di Palermo
>> >
>> > Phone: +3909123860056
>> > Fax: +390916529124
>> >
>> >
>> > _______________________________________________
>> > Spacewalk-list mailing list
>> > Spacewalk-list@redhat.com
>> > https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>> Hi again,
>> I re-installed my spacewalk from scratch on a new linux box.
>>
>> I also used a fresh oracle schema on the same db machine
>> where is the production db (different from both the
>> spacewalk-production and spacewalk-new box).
>> My oracle db is 10.2.0.5 and (of course) spacewalk version
>> is 1.9
>>
>> I registered a system on the new spacewalk instance, created
>> a monitoring probe and all is working fine.
>>
>> So, I guess, there must be something in the production
>> schema database making my spacewalk-production box not
>> sending notification e-mails.
>>
>> I checked my db for invalid objects but found nothing.
>>
>> My production DB schema was created with spacewalk 1.0 and
>> updated every new release.
>>
>> What I have to check on my DB in order to solve that issue?
>> Alternatively, can I export my production schema data in the
>> new schema without broken the new schema?
>> Keep present that change the spacewalk IP, hostname and
>> certificate.
>>
>> I have 180 registered systems and 829 configured monitoring
>> probes in my production spacewalk instance, and I use
>> spacewalk to manage, deploy configuration files and
>> monitoring, so, if it's possible, I prefer don't move my
>> systems manually in the new spacewalk installation.
>>
>> Keep also present, all is working in my production
>> installation, even monitorig probes (from the web gui I see
>> the correct status for every probe). Just notification
>> e-mail don't work.
>>
>> Can you help me?
>> Best regards.
>
> You can check that relevant packages were not altered:
>
> # rpm -qV `rpm -qa | grep -i -e noc -e monitoring -e perl`
> .M.......    /var/lib/nocpulse/NPkernel.out
> S.5....T.  c /etc/NOCpulse.ini
>
> (you can compare output from your new and old box)
>
> Regards,
> Jan
>
>
>
> --
> Jan Hutar     Systems Management QA
> jhu...@redhat.com     Red Hat, Inc.

Hi,
Both my old and new box have the same output:

old box

# rpm -qV `rpm -qa | grep -i -e noc -e monitoring -e perl`
S.5....T.  c /etc/NOCpulse.ini
.M.......    /var/lib/nocpulse/NPkernel.out

new box

# rpm -qV `rpm -qa | grep -i -e noc -e monitoring -e perl`
S.5....T.  c /etc/NOCpulse.ini
.M.......    /var/lib/nocpulse/NPkernel.out

Any other idea?
Thanks

Hmm, so I'm out of ideas :-/ I guess following all related logs
(/var/log/rhn/* /var/log/httpd/* /var/log/nocpulse/*(and
sub-dirs?)) do not show any pointers? Maybe somebody else?

For possible moving these systems you can use satellite-sync
(ISS feature) to sync software channels content to new box and
then perform remote commands to re-register yours 180 systems to
new box (using systems set manager).

Sorry,
Jan



--
Jan Hutar     Systems Management QA
jhu...@redhat.com     Red Hat, Inc.

Thank you for the support.

Here are the relevant parts (I think) of my logfiles:

/var/log/rhn/rhn_taskomatic_daemon.log
=========================================================
INFO | jvm 1 | 2013/06/06 23:40:03 | 2013-06-06 23:40:03,634 [DefaultQuartzScheduler_Worker-1] ERROR com.redhat.rhn.taskomatic.task.SynchProbeState - Error during probe state sync INFO | jvm 1 | 2013/06/06 23:40:03 | com.redhat.rhn.common.db.WrappedSQLException: ORA-00060: deadlock detected while waiting for resource INFO | jvm 1 | 2013/06/06 23:40:03 | ORA-06512: at "SWKUNIPA.RHN_SYNCH_PROBE_STATE", line 4
INFO   | jvm 1    | 2013/06/06 23:40:03 | ORA-06512: at line 1
INFO   | jvm 1    | 2013/06/06 23:40:03 |
INFO | jvm 1 | 2013/06/06 23:40:03 | at com.redhat.rhn.common.translation.SqlExceptionTranslator.oracleSQLException(SqlExceptionTranslator.java:82) INFO | jvm 1 | 2013/06/06 23:40:03 | at com.redhat.rhn.common.translation.SqlExceptionTranslator.sqlException(SqlExceptionTranslator.java:42) INFO | jvm 1 | 2013/06/06 23:40:03 | at com.redhat.rhn.common.db.NamedPreparedStatement.execute(NamedPreparedStatement.java:120) INFO | jvm 1 | 2013/06/06 23:40:03 | at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:523) INFO | jvm 1 | 2013/06/06 23:40:03 | at com.redhat.rhn.common.db.datasource.CallableMode.execute(CallableMode.java:35) INFO | jvm 1 | 2013/06/06 23:40:03 | at com.redhat.rhn.taskomatic.task.SynchProbeState.execute(SynchProbeState.java:44) INFO | jvm 1 | 2013/06/06 23:40:03 | at com.redhat.rhn.taskomatic.task.RhnJavaJob.execute(RhnJavaJob.java:80) INFO | jvm 1 | 2013/06/06 23:40:03 | at com.redhat.rhn.taskomatic.TaskoJob.execute(TaskoJob.java:169) INFO | jvm 1 | 2013/06/06 23:40:03 | at org.quartz.core.JobRunShell.run(JobRunShell.java:216) INFO | jvm 1 | 2013/06/06 23:40:03 | at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) INFO | jvm 1 | 2013/06/06 23:40:03 | Caused by: java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource INFO | jvm 1 | 2013/06/06 23:40:03 | ORA-06512: at "SWKUNIPA.RHN_SYNCH_PROBE_STATE", line 4
INFO   | jvm 1    | 2013/06/06 23:40:03 | ORA-06512: at line 1
INFO   | jvm 1    | 2013/06/06 23:40:03 |
INFO | jvm 1 | 2013/06/06 23:40:03 | at oracle.jdbc.driver.T2CConnection.checkError(T2CConnection.java:765) INFO | jvm 1 | 2013/06/06 23:40:03 | at oracle.jdbc.driver.T2CConnection.checkError(T2CConnection.java:662) INFO | jvm 1 | 2013/06/06 23:40:03 | at oracle.jdbc.driver.T2CCallableStatement.executeForDescribe(T2CCallableStatement.java:548) INFO | jvm 1 | 2013/06/06 23:40:03 | at oracle.jdbc.driver.T2CCallableStatement.executeForRows(T2CCallableStatement.java:731) INFO | jvm 1 | 2013/06/06 23:40:03 | at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1329) INFO | jvm 1 | 2013/06/06 23:40:03 | at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3584) INFO | jvm 1 | 2013/06/06 23:40:03 | at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3685) INFO | jvm 1 | 2013/06/06 23:40:03 | at oracle.jdbc.driver.OracleCallableStatement.execute(OracleCallableStatement.java:4714) INFO | jvm 1 | 2013/06/06 23:40:03 | at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1376) INFO | jvm 1 | 2013/06/06 23:40:03 | at com.mchange.v2.c3p0.impl.NewProxyCallableStatement.execute(NewProxyCallableStatement.java:2417) INFO | jvm 1 | 2013/06/06 23:40:03 | at com.redhat.rhn.common.db.NamedPreparedStatement.execute(NamedPreparedStatement.java:117)
INFO   | jvm 1    | 2013/06/06 23:40:03 |       ... 7 more
INFO | jvm 1 | 2013/06/06 23:40:03 | 2013-06-06 23:40:03,637 [DefaultQuartzScheduler_Worker-1] ERROR com.redhat.rhn.taskomatic.task.SynchProbeState - com.redhat.rhn.common.db.WrappedSQLException: ORA-00060: deadlock detected while waiting for resource INFO | jvm 1 | 2013/06/06 23:40:03 | ORA-06512: at "SWKUNIPA.RHN_SYNCH_PROBE_STATE", line 4
INFO   | jvm 1    | 2013/06/06 23:40:03 | ORA-06512: at line 1
INFO   | jvm 1    | 2013/06/06 23:40:03 |
INFO | jvm 1 | 2013/06/06 23:40:03 | 2013-06-06 23:40:03,637 [DefaultQuartzScheduler_Worker-1] ERROR com.redhat.rhn.taskomatic.task.SynchProbeState - com.redhat.rhn.common.db.WrappedSQLException: ORA-00060: deadlock detected while waiting for resource INFO | jvm 1 | 2013/06/06 23:40:03 | ORA-06512: at "SWKUNIPA.RHN_SYNCH_PROBE_STATE", line 4
INFO   | jvm 1    | 2013/06/06 23:40:03 | ORA-06512: at line 1
INFO   | jvm 1    | 2013/06/06 23:40:03 |
INFO | jvm 1 | 2013/06/07 04:05:00 | 2013-06-07 04:05:00,427 [DefaultQuartzScheduler_Worker-7] INFO com.redhat.rhn.taskomatic.task.SandboxCleanup - Removing sandbox channels: 1
=====================================================

/var/log/nocpulse/GenerateNotifConfig-error.log
==========================================
2013-06-04 23:41:15 ORA-00018: maximum number of sessions exceeded (DBD ERROR: OCISessionBegin) at /usr/share/perl5/vendor_perl/RHN/DBI.pm line 72 2013-06-04 23:41:16 DBI connect('spacewalk','swkunipa',...) failed: ORA-00604: error occurred at recursive SQL level 1 2013-06-04 23:41:16 ORA-00018: maximum number of sessions exceeded (DBD ERROR: OCISessionBegin) at /usr/share/perl5/vendor_perl/RHN/DBI.pm line 72 2013-06-04 23:41:18 DBI connect('spacewalk','swkunipa',...) failed: ORA-00604: error occurred at recursive SQL level 1 2013-06-04 23:41:18 ORA-00018: maximum number of sessions exceeded (DBD ERROR: OCISessionBegin) at /usr/share/perl5/vendor_perl/RHN/DBI.pm line 72 2013-06-04 23:41:19 DBI connect('spacewalk','swkunipa',...) failed: ORA-00604: error occurred at recursive SQL level 1 2013-06-04 23:41:19 ORA-00018: maximum number of sessions exceeded (DBD ERROR: OCISessionBegin) at /usr/share/perl5/vendor_perl/RHN/DBI.pm line 72
========================================

I have also this in /var/log/nocpulse/kernel-error.log, but I can't figure out what is the locked account (I just have 1 account used for spacewalk itself and another to run probes against the oracle database and both aren't locked).
=========================================================================
2013-06-07 12:46:38  2013-06-07 12:46:38 Probe 71 produced error output:
2013-06-07 12:46:38     STDERR:
2013-06-07 12:46:38 >>>NOCpulse::Probe::Error: ORA-28000: the account is locked /usr/share/perl5/vendor_perl/NOCpulse/Probe/DataSource/Oracle.pm line 82 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/Probe/DataSource/Oracle.pm line 104 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/Probe/DataSource/AbstractDataSource.pm line 40 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/Probe/DataSource/AbstractDatabase.pm line 50 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/Probe/DataSource/Oracle.pm line 40 2013-06-07 12:46:38 called from blib/lib/Class/MethodMaker/Engine.pm (autosplit into blib/lib/auto/Class/MethodMaker/Engine/new.al) line 957 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/Probe/DataSource/Factory.pm line 92 2013-06-07 12:46:38 called from /var/lib/nocpulse/libexec/Oracle/Availability.pm line 18 2013-06-07 12:46:38 called from /var/lib/nocpulse/libexec/Oracle/Availability.pm line 26 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/Probe/ProbeRunner.pm line 115 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/Probe/ProbeRunner.pm line 131 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/Scheduler/Event/ProbeEvent.pm line 31 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/Process.pm line 125 2013-06-07 12:46:38 called from /usr/share/perl5/vendor_perl/NOCpulse/ProcessPool.pm line 150
2013-06-07 12:46:38     called from /usr/bin/kernel.pl line 206
2013-06-07 12:46:38     called from /usr/bin/kernel.pl line 158
2013-06-07 12:46:38  <<
=================================================================

Then, I have a lot of this in /var/log/nocpulse/NotifEscalator-error.log
================================
2013-06-07 12:51:36 30852 Unable to acquire lock on /var/tmp/01_1370602295_030826_001 /usr/bin/notifier-30962 at /usr/share/perl5/vendor_perl/NOCpulse/Notif/AlertFile.pm line 111.
================================

/var/log/nocpulse/notif-escalator.log
======================================================
2013-06-07 12:51:35 Registered Alert [00098] /var/tmp/01_1370602295_030826_001
2013-06-07 12:51:35 Registering sends for Alert [00098] 01_1370602295_030826_001 send count = 1 2013-06-07 12:51:35 Registered Send [k6tg38] Alert [00098] 1_i41_Database Admin (xxx...@unipa.it) 2013-06-07 12:51:35 Registered Send [k6tg38] Alert [00098] 1_i41_Database Admin (xxx...@unipa.it) 2013-06-07 12:51:35 Queuing sends for Alert [00098] 01_1370602295_030826_001 send count = 1
2013-06-07 12:51:36 Dispensing sends k6tg38 /var/tmp/01_1370602295_030826_001
2013-06-07 12:51:36 Ack Error Send [k6tg38] Alert [00098] nak: Alert not found
=======================================================

/var/log/nocpulse/Notifier-error.log
====================================
2013-06-07 12:51:36 Net::SMTP=GLOB(0x35fe400)<<< 220 spacewalk.unipa.it ESMTP Sendmail 8.14.4/8.14.4; Fri, 7 Jun 2013 12:51:36 +0200
2013-06-07 12:51:36  Net::SMTP=GLOB(0x35fe400)>>> EHLO localhost.localdomain^M
2013-06-07 12:51:36 Net::SMTP=GLOB(0x35fe400)<<< 250-spacewalk.unipa.it Hello localhost [127.0.0.1], pleased to meet you
2013-06-07 12:51:36  Net::SMTP=GLOB(0x35fe400)<<< 250-ENHANCEDSTATUSCODES
2013-06-07 12:51:36  Net::SMTP=GLOB(0x35fe400)<<< 250-PIPELINING
2013-06-07 12:51:36  Net::SMTP=GLOB(0x35fe400)<<< 250-8BITMIME
2013-06-07 12:51:36  Net::SMTP=GLOB(0x35fe400)<<< 250-SIZE
2013-06-07 12:51:36  Net::SMTP=GLOB(0x35fe400)<<< 250-DSN
2013-06-07 12:51:36  Net::SMTP=GLOB(0x35fe400)<<< 250-ETRN
2013-06-07 12:51:36  Net::SMTP=GLOB(0x35fe400)<<< 250-DELIVERBY
2013-06-07 12:51:36  Net::SMTP=GLOB(0x35fe400)<<< 250 HELP
=====================================

/var/log/nocpulse/notifier.log
================================
2013-06-07 12:50:24 Started Send [4w0gn0] Alert [00097] 1_i2_Systems Admins (xxxxx...@unipa.it)
2013-06-07 12:50:24 Connecting to localhost
2013-06-07 12:50:24 Send error Send [4w0gn0] Alert [00097] 1_i2_Systems Admins (xxxxx...@unipa.it): (nak) smtp code: Off-duty
2013-06-07 12:51:00 waiting for new sends to launch...
2013-06-07 12:51:36 Started Send [k6tg38] Alert [00098] 1_i41_Database Admin (xxxxx...@unipa.it)
2013-06-07 12:51:36 Connecting to localhost
2013-06-07 12:51:36 Send error Send [k6tg38] Alert [00098] 1_i41_Database Admin (xxxxx...@unipa.it): (nak) smtp code: Off-duty
===============================

/var/log/nocpulse/notif-launcher.log
===============================
2013-06-07 12:51:35 Registered Alert [00098] 01_1370602295_030826_001
2013-06-07 12:51:35 Registered Send [k6tg38] Alert [00098] 1_i41_Database Admin (xxxxx...@unipa.it) 2013-06-07 12:51:35 Queued Send [k6tg38] Alert [00098] 1_i41_Database Admin (xxxxx...@unipa.it) 2013-06-07 12:51:35 NOCpulse::Notif::FileQueue::peek /var/lib/notification/queue/alert_queue/01_1370602295_030826_001 no longer exists -- dequeuing
==============================

/var/log/nocpulse/TSDBLocalQueue-errors.log
====================================
2013-06-07 12:44:57 2013-06-07 12:44:52 Problem reading current files: RHN::Exception: DBD::Oracle::st execute failed: ORA-02291: integrity constraint (SWKUNIPA.TIME_SERIES_DATA_PID_FK) violated - parent key not found (DBD ERROR: OCIStmtExecute) [for Statement "insert into time_series_data 2013-06-07 12:44:57 (org_id, probe_id, probe_desc, entry_time, data) values
2013-06-07 12:44:57      (:org_id, :probe_id, :probe_desc, :entry_time, :data)
2013-06-07 12:44:57 " with ParamValues: :data='2', :entry_time='1370601888', :org_id='1', :probe_desc='pctused', :probe_id='1982'] 2013-06-07 12:44:57 RHN::DB /usr/share/perl5/vendor_perl/RHN/DB.pm 117 RHN::Exception::DB::throw 2013-06-07 12:44:57 RHN::DB::st /usr/share/perl5/vendor_perl/RHN/DB.pm 455 RHN::DB::handle_error 2013-06-07 12:44:57 NOCpulse::Database /usr/share/perl5/vendor_perl/NOCpulse/Database.pm 72 RHN::DB::st::execute_h 2013-06-07 12:44:57 NOCpulse::Database /usr/share/perl5/vendor_perl/NOCpulse/Database.pm 89 NOCpulse::Database::do_insert 2013-06-07 12:44:57 main /usr/bin/TSDBLocalQueue.pl 220 NOCpulse::Database::insert 2013-06-07 12:44:57 main /usr/bin/TSDBLocalQueue.pl 110 main::insert_time_series 2013-06-07 12:44:57 Error::subs /usr/share/perl5/Error.pm 415 main::__ANON__
2013-06-07 12:44:57    Error::subs /usr/share/perl5/Error.pm 407 (eval)
2013-06-07 12:44:57    main /usr/bin/TSDBLocalQueue.pl 118 Error::subs::try
2013-06-07 12:44:57
2013-06-07 12:44:57  Offending Query: insert into time_series_data
2013-06-07 12:44:57 (org_id, probe_id, probe_desc, entry_time, data) values
2013-06-07 12:44:57      (:org_id, :probe_id, :probe_desc, :entry_time, :data)
2013-06-07 12:44:57
2013-06-07 12:44:57
======================================

I think this one is important, becouse it shows ann integrity constraint violation.

I hope this can help you.
Thank you again for the support.
Best regards.
--
Benedetto Vassallo
Sistema Informativo di Ateneo
Settore Gestione Reti Hardware e Software
U.O.B. Sviluppo e manutenzione dei sistemi
Università degli studi di Palermo

Phone: +3909123860056
Fax: +390916529124


_______________________________________________
Spacewalk-list mailing list
Spacewalk-list@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-list

Reply via email to