Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
Quoting Cam Ellison (c...@ellisonet.ca): > With regard to the rest of your email (snipped out), I'll try that if > nothing comes from CTCS. Two halts five weeks apart doesn't give me > much to work with. Yes. One of the Bad Words that one doesn't really want to hear when performing diagnosis of any kind, including on computer hardware, is 'intermittant'. > I did try dmidecode on the PS, but drew a blank, perhaps not > surprisingly. On the basis of your instincts, plus my own suspicions > and previous experience (now that I think about it), I'm beginning to > suspect the PSU. Could be. FWIW, this would be the very first time I'd heard of an Antec PSU being the root cause of a system problem. They're really good. However, there's always a first time. ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
On 10-12-08 03:26 PM, Rick Moen wrote: > Quoting Cam Ellison (c...@ellisonet.ca): > >> On another list that I frequent, the two responses thus far both >> suggested replacing or swapping out the PS. I have to admit the idea >> has merit, though it's an Antec Signature 650, came new with the rest of >> the system, and over $200 here including the taxes. I'm a little leery >> of ending up with a good, but effectively useless, PS. Which leads to >> another question: how do you test a PS? Is it possible? > I'm sure it's possible (at least in theory), but I never have tried. > I've always just tried to keep around at least one of each major type > with a piece of masking tape on it labelled 'known good as of [date]', > and swap those into systems where I suspect the PSU. > > If the PSU is generally functional, then in my experience the usual > question is whether it is too weak for the current draw asked of it. > (In a perfect world, you would be able to believe manufacturer ratings, > but of course they lie and exaggerate, and also doubtless some PSUs > achieve their claimed ratings better loaded with some impedance types > than others.) > > Antec PSUs are on the short list of ones I have faith in, generally. > > > I have a confession to make: I really didn't pay much attention to this > thread until I saw Brian mention CTCS (Cerberus), with which I have a > great deal of experience. I've just now re-read your original posting > to get the context for all this. > > That having been done, I think the suggestion of a (say, overnight) > Cerberus run has a lot to recommend it. Cerberus puts a system under > very, very serious load, which is the rationale for its use to > stress-test newly constructed systems on the VA Linux Systems production > line: It exposes most hardware flaws through thrashing the hell out of > pretty nearly every hardware subsystem in the host. That sounds like the way to go. I've downloaded and unzipped it. Now to grab a new kernel (this is a Kubuntu box, and there are only header files) and set things for this weekend, maybe. > Your description (halted suddenly, no output, coldboot required) doesn't > sound a-priori like a RAM problem. It's conceivable that it's a > software problem, but my instinct says hardware is more likely. That > instinct says it's likely to be something with either the motherboard + > CPU or with the PSU. Fortunately, they're within warranty. Unfortunately, enough time has passed that it will mean shipping to the manufacturer. Too bad I didn't know about CTCS earlier - I guess that's for next time, if there is one. :-p With regard to the rest of your email (snipped out), I'll try that if nothing comes from CTCS. Two halts five weeks apart doesn't give me much to work with. I did try dmidecode on the PS, but drew a blank, perhaps not surprisingly. On the basis of your instincts, plus my own suspicions and previous experience (now that I think about it), I'm beginning to suspect the PSU. Time for some negotiation with the supplier, I think. Thanks again Cam > ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
Quoting Cam Ellison (c...@ellisonet.ca): > On another list that I frequent, the two responses thus far both > suggested replacing or swapping out the PS. I have to admit the idea > has merit, though it's an Antec Signature 650, came new with the rest of > the system, and over $200 here including the taxes. I'm a little leery > of ending up with a good, but effectively useless, PS. Which leads to > another question: how do you test a PS? Is it possible? I'm sure it's possible (at least in theory), but I never have tried. I've always just tried to keep around at least one of each major type with a piece of masking tape on it labelled 'known good as of [date]', and swap those into systems where I suspect the PSU. If the PSU is generally functional, then in my experience the usual question is whether it is too weak for the current draw asked of it. (In a perfect world, you would be able to believe manufacturer ratings, but of course they lie and exaggerate, and also doubtless some PSUs achieve their claimed ratings better loaded with some impedance types than others.) Antec PSUs are on the short list of ones I have faith in, generally. I have a confession to make: I really didn't pay much attention to this thread until I saw Brian mention CTCS (Cerberus), with which I have a great deal of experience. I've just now re-read your original posting to get the context for all this. That having been done, I think the suggestion of a (say, overnight) Cerberus run has a lot to recommend it. Cerberus puts a system under very, very serious load, which is the rationale for its use to stress-test newly constructed systems on the VA Linux Systems production line: It exposes most hardware flaws through thrashing the hell out of pretty nearly every hardware subsystem in the host. Your description (halted suddenly, no output, coldboot required) doesn't sound a-priori like a RAM problem. It's conceivable that it's a software problem, but my instinct says hardware is more likely. That instinct says it's likely to be something with either the motherboard + CPU or with the PSU. One avenue towards diagnosis (generally speaking and probably _not_ useful for your situation; this is just for general knowledge of troubleshooting) is to simplify the hardware situation temporarily for diagnostic purposes, to attempt to isolate the problem. That is, open up your system and look at what's plugged into what. Do you have expansion cards that can be disconnected and the system is still able to produce video? Remove them. A miniPCI-format wireless card? Unplug it. Non-boot hard drives? Unplug and detach them. Optical drives? Unplug and detach them. Get as close as you can to just motherboard + PSU and still have the system be functional enough to run and expose the syndrome if it's still present. That method is useful primarily for symptoms that express strongly and constantly, like 'System doesn't even beep or produce video'. In those cases, you detach every non-essential subsystem and see if the remaining hardware then beeps and does video. If it does, then the root cause lies in one of the subsystems you detached -or- in the 100%-wired-up system trying to draw too much current from a borderline PSU. If if doesn't, then the problem may be in the system core (motherboard, PSU, CPU, RAM). The latter case is of course tough to narrow down. If you have multiple sticks of RAM, and the motherboard northbridge can function with fewer than all of them, try with half the RAM, then with the other half, seeing if bootup beep + video reappears and correlates with one bank of RAM but not the other. Getting back to steps more likely relevant to _your_ problem, the other general class of diagnostic techniques involve swapping in known-good components, and seeing if the problem suddenly vanishes with one such swap-in. The pain-in-the-ass requisite is, of course, having a bunch of known-good parts sitting around for this purpose, which one only rarely has. Sorry, I don't know any easy way around that. -- Rick Moen "Told my friend she shouldn't smoke weed while she's r...@linuxmafia.com pregnant because her baby's never going to want to McQ! (4x80) come out." -- Kelly Oxford ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
On Wed, Dec 08, 2010 at 02:37:35PM -0800, Cam Ellison wrote: > On 10-12-08 01:43 PM, Rick Moen wrote: > > Quoting Cam Ellison (c...@ellisonet.ca): > > > >> This is my only machine, and it's a production machine, so I'm not sure > >> about taking it out of service to run ctcs2 (thanks Rick!). > > You're very welcome. I have notes here, which I recommend, because > > Cerberus is rather peculiar software that takes a little getting used > > to, and has some quirks. > > > > 'Burn-in' on http://linuxmafia.com/kb/Hardware > > > > (We used to put all new or repaired machines at VA Linux Systems through > > at least 48 hours of Cerberus / ctcs testing, to catch problems.) > > > > That looks very useful. I'll give it a try. > > On another list that I frequent, the two responses thus far both > suggested replacing or swapping out the PS. I have to admit the idea > has merit, though it's an Antec Signature 650, came new with the rest of > the system, and over $200 here including the taxes. I'm a little leery > of ending up with a good, but effectively useless, PS. Which leads to > another question: how do you test a PS? Is it possible? The burn in process would probably reveal the fault, as it will load the machine using more power and creating heat. -- Brian Lavender http://www.brie.com/brian/ "Program testing can be used to show the presence of bugs, but never to show their absence!" Professor Edsger Dijkstra 1972 Turing award recipient ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
On 10-12-08 01:43 PM, Rick Moen wrote: > Quoting Cam Ellison (c...@ellisonet.ca): > >> This is my only machine, and it's a production machine, so I'm not sure >> about taking it out of service to run ctcs2 (thanks Rick!). > You're very welcome. I have notes here, which I recommend, because > Cerberus is rather peculiar software that takes a little getting used > to, and has some quirks. > > 'Burn-in' on http://linuxmafia.com/kb/Hardware > > (We used to put all new or repaired machines at VA Linux Systems through > at least 48 hours of Cerberus / ctcs testing, to catch problems.) > That looks very useful. I'll give it a try. On another list that I frequent, the two responses thus far both suggested replacing or swapping out the PS. I have to admit the idea has merit, though it's an Antec Signature 650, came new with the rest of the system, and over $200 here including the taxes. I'm a little leery of ending up with a good, but effectively useless, PS. Which leads to another question: how do you test a PS? Is it possible? Thanks again Cam ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
Quoting Cam Ellison (c...@ellisonet.ca): > This is my only machine, and it's a production machine, so I'm not sure > about taking it out of service to run ctcs2 (thanks Rick!). You're very welcome. I have notes here, which I recommend, because Cerberus is rather peculiar software that takes a little getting used to, and has some quirks. 'Burn-in' on http://linuxmafia.com/kb/Hardware (We used to put all new or repaired machines at VA Linux Systems through at least 48 hours of Cerberus / ctcs testing, to catch problems.) ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
On 10-12-08 10:40 AM, Brian Lavender wrote: > You might also want to try sar. Here is an interesting article. > > http://www.linux.com/archive/feature/114224 I am not familiar with it, so I've downloaded it and started it, and will get ksar as well (so I can get a better handle on the output), and have a go. On *Ubuntu it's part of a package called sysstat. It does look quite interesting, not to mention comprehensive - the man page goes on forever, > brian > > On Wed, Dec 08, 2010 at 10:00:32AM -0800, Brian Lavender wrote: >> Check the system event logs in the motherboard bios. Sometimes listed >> under SEL. Otherwise, I would stress test the machine. I used to run ctcs >> to burn in systems for a cluster I worked on for LLNL. It does memory, >> io, and cpu stress tests. >> >> http://sourceforge.net/projects/va-ctcs/ This is my only machine, and it's a production machine, so I'm not sure about taking it out of service to run ctcs2 (thanks Rick!). It may be worth a trial, nonetheless, in the wee hours of weekend morning. As to the system event log, I just ran dmidecode, and it shows no errors. Mind you, this is 32 hours later, with a reboot in between, so anything that was current then may have been over-written. >> You could also try lm-sensors to monitor the hardware. >> >> http://ubuntuforums.org/showthread.php?t=2780 >> I have lm-sensors installed. The only thing I can access on this MB is one temperature setting. Mind you, I've only relied on gkrellm to find them, though with other MBs it's been pretty good at sussing them out. I'll run the setup utility and see what I can find. Voltage variability might be the culprit, I suppose. I still wonder if it's a software issue with the various cron jobs that run at that time, and I'm still working through. Anyway, thank you very much for these ideas - they're a considerable help. Cam ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
Quoting Brian Lavender (br...@brie.com): > Check the system event logs in the motherboard bios. Sometimes listed > under SEL. Otherwise, I would stress test the machine. I used to run ctcs > to burn in systems for a cluster I worked on for LLNL. It does memory, > io, and cpu stress tests. > > http://sourceforge.net/projects/va-ctcs/ No longer maintained. See the successor fork: http://sourceforge.net/projects/ctcs2/ ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
You might also want to try sar. Here is an interesting article. http://www.linux.com/archive/feature/114224 brian On Wed, Dec 08, 2010 at 10:00:32AM -0800, Brian Lavender wrote: > Check the system event logs in the motherboard bios. Sometimes listed > under SEL. Otherwise, I would stress test the machine. I used to run ctcs > to burn in systems for a cluster I worked on for LLNL. It does memory, > io, and cpu stress tests. > > http://sourceforge.net/projects/va-ctcs/ > > You could also try lm-sensors to monitor the hardware. > > http://ubuntuforums.org/showthread.php?t=2780 > > brian > > On Wed, Dec 08, 2010 at 07:21:14AM -0800, Cam Ellison wrote: > > Recently I upgraded to this MB, with an AMD Phenom II. The latest > > Kubuntu (10.10) is loaded onto it. Twice it has halted suddenly: no > > activity, no output, and has required a reboot. This is not an external > > power problem: its power comes from an APC 3000. There is nothing in > > the logs: everything runs normally and then suddenly nothing does. > > > > In both instances, the stoppage has occurred in the middle of a set of > > cron.daily jobs (in the middle of the night), so I am exploring that > > avenue. The problem is that the machine has been in place for 7 weeks, > > and the two stoppages are about 5 weeks apart - there's not much to go on. > > > > I'm looking for any ideas about how to track this down: is there a > > utility that might give me more insight? More to the point, does anyone > > in the group have this combination and a comparable experience? > > > > TIA > > > > Cam Ellison > > > > > > ___ > > vox-tech mailing list > > vox-tech@lists.lugod.org > > http://lists.lugod.org/mailman/listinfo/vox-tech > > -- > Brian Lavender > http://www.brie.com/brian/ > > "Program testing can be used to show the presence of bugs, but never to > show their absence!" > > Professor Edsger Dijkstra > 1972 Turing award recipient > ___ > vox-tech mailing list > vox-tech@lists.lugod.org > http://lists.lugod.org/mailman/listinfo/vox-tech -- Brian Lavender http://www.brie.com/brian/ "Program testing can be used to show the presence of bugs, but never to show their absence!" Professor Edsger Dijkstra 1972 Turing award recipient ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
Check the system event logs in the motherboard bios. Sometimes listed under SEL. Otherwise, I would stress test the machine. I used to run ctcs to burn in systems for a cluster I worked on for LLNL. It does memory, io, and cpu stress tests. http://sourceforge.net/projects/va-ctcs/ You could also try lm-sensors to monitor the hardware. http://ubuntuforums.org/showthread.php?t=2780 brian On Wed, Dec 08, 2010 at 07:21:14AM -0800, Cam Ellison wrote: > Recently I upgraded to this MB, with an AMD Phenom II. The latest > Kubuntu (10.10) is loaded onto it. Twice it has halted suddenly: no > activity, no output, and has required a reboot. This is not an external > power problem: its power comes from an APC 3000. There is nothing in > the logs: everything runs normally and then suddenly nothing does. > > In both instances, the stoppage has occurred in the middle of a set of > cron.daily jobs (in the middle of the night), so I am exploring that > avenue. The problem is that the machine has been in place for 7 weeks, > and the two stoppages are about 5 weeks apart - there's not much to go on. > > I'm looking for any ideas about how to track this down: is there a > utility that might give me more insight? More to the point, does anyone > in the group have this combination and a comparable experience? > > TIA > > Cam Ellison > > > ___ > vox-tech mailing list > vox-tech@lists.lugod.org > http://lists.lugod.org/mailman/listinfo/vox-tech -- Brian Lavender http://www.brie.com/brian/ "Program testing can be used to show the presence of bugs, but never to show their absence!" Professor Edsger Dijkstra 1972 Turing award recipient ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
[vox-tech] Problem with Gigabyte 890FX, Phenom II, and Kubuntu
Recently I upgraded to this MB, with an AMD Phenom II. The latest Kubuntu (10.10) is loaded onto it. Twice it has halted suddenly: no activity, no output, and has required a reboot. This is not an external power problem: its power comes from an APC 3000. There is nothing in the logs: everything runs normally and then suddenly nothing does. In both instances, the stoppage has occurred in the middle of a set of cron.daily jobs (in the middle of the night), so I am exploring that avenue. The problem is that the machine has been in place for 7 weeks, and the two stoppages are about 5 weeks apart - there's not much to go on. I'm looking for any ideas about how to track this down: is there a utility that might give me more insight? More to the point, does anyone in the group have this combination and a comparable experience? TIA Cam Ellison ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech