Phil, Thanks. No we didn't fill any disks. I have about 18 messages hung on each of the 20 servers. I am about to just delete all of them and restart things. I did a restart of the mmdf sctipts but that didn't help. In the past the main relay server used to stop accepting mail beacause of a bad or malformed email Like you said it is flaky. I learned this and when mail used to hang on the relay server I would find the culprit and get delete. This time it is all 20 servers taht sort of got in unison to be stubborn and stop sending the mail. The relay server is sending email like a champ.
I haven't restarted any of the servers which is my last resort. I use email on these servers to handle reports and also incident reporting (failed and succesful backups). Thanks for the reply. >>> Phil Pennock <[email protected]> 01/10/09 2:20 AM >>> On 2009-01-08 at 20:27 -0500, John BORIS wrote: > I have a group of 20 remote servers running SCO Open Server 5.0.6 that > have all stopped sending mail. This is an internal setup where they send > mail to a relay server. I can telnet to the smtp port on the relay > server from each of them so it isn't a connection issue. We did have a > WAN outage for about an hour just before this started. I am looking for > something or someway to kick start the mail. Any ideas would be greatly > appreciated. $previous_employer (ISP) where I was postmaster used MMDF for incoming customer mail until I migrated it away. I used it for long enough to make it vaguely reliable (Support Dept thought we'd finished the Exim migration before it started because the complaints stopped, which I took as a nice compliment). However, the painful memories are being self-censored. Sorry. Note that this was MMDF from SCO for historical reasons, but not running on SCO -- I've only even touched SCO once. If I recall correctly, MMDF's a queue-based design where mails move between different queues as it's processed, somewhat like mailman. I remember liking the general design philosophy, but it's not 7-bit clean and it's very old code. And it involves too much FS metadata manipulation to scale as well as some alternatives. I think that there's a data-file with the content in one directory and then a control file which gets hard-linked to shuffle it between the various stage queues, but I'm no longer sure of that. So, what's probably happened is that the queue-runners for the queue handling mail to the relay host have gone down; ISTR some flakiness with queue-runners and the keep-alive scripts. I think the runners mostly run independently so the outbound runners can be dead and there's no meta-daemon to restart them -- it's just the regular start-up scripts. Again though, I'm no longer sure of my recollection. Honestly, my first instinct would be to down and up the service using the regular init scripts to kick all the queue-runners into service and if they fail, start looking at logs for diagnostics. If the volume is high enough that the relay server caused enough mail to back up, did you fill any disks? -Phil _______________________________________________ Tech mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
