|
Recent
Articles |

Controlling Disk Space W/ Symbolic Links I've covered this in other articles here, but when I went searching for something to point a customer at I had a little trouble finding it, so we'll do it here: Let's say you have a system with a few filesystems. One of those systems is getting tight on space, but the...
The Patent Ploy Microsoft to Offer Sales Support for Novell's Suse Linux... Huh? Did I hear that right? Yep, there it is again: Linux to work with Windows. A lot of people are wondering what Microsoft is up to, what possible motivation drove this. I'd ask the same of Novell: lying down with the...
Terrasoft Supplying Linux For PS3 Is it really news when Terrasoft announces that it will be supplying Linux for the PS3?
I mean, sheesh, you can't even buy a PS3 yet. On the other hand...Is this going to be the road by which the insidious penguin infiltrates the households of god-fearing folk who otherwise would...
Zeroconf I liked the last paragraph of Carla Schroder's "Run Zeroconf for Linux in a Snap."Zeroconf is a lightweight, fairly simple set of cross-platform protocols that can work on any system- Linux, Unix, Macintosh or Windows. But currently Windows is the odd one out, as Microsoft...
Red Hat Falls On Sharp Numbers A disappointing second quarter for the Linux distributor met with fierce retribution from shareholders in trading. That sharp whistling sound was that of Red Hat shares dropping in after-hours trading and continuing to plunge when the market opened after the company...
|
|
|
12.06.06 Mysterious Lockups
By
A.P. Lawrence
Of all computer problems, the unresponsive hang is the most annoying and most difficult to trace. There's no crash, no panic: everything just stops dead.
The keyboard is useless, telnets just time out - you have no choice but to power cycle the machine.
Well, maybe. If you are running Linux, and if you have Magic Sysrq enabled, you might be able to do more. Even SCO has something similar: scodb gives access to a kernel debugger if available. I don't know of anything like that for Mac OS X; there is ddb but that requires attaching a serial terminal and a recompiled kernel.
But let's say none of that is helpful. In that case, the first thing you want to know is "how dead is it?". Is the keyboard totally dead - if it has lights for Caps-Lock, do they cycle on and off as you press that? If not, you may have a motherboard or keyboard problem. Can you "Control-ALT-F3" (Linux and SCO) to switch screens? If so, the OS is still at least partially alive. Can you telnet or ssh to the box? Can you ping it? Does Samba or NFS etc. still work? These give you clues as to the state of the networking stack.
Ok, you've given up. There's nothing that can be done but a power cycle. Here's another chance to possibly learn something: does a reset exhibit different behaviour than a complete power off? If it takes a power off to get the machine responsive again, how long does it have to be off? Short rest periods might indicate capacitor or register problems: giving the machine a little more time to "bleed off" cures the problem. A need for a longer period off might mean heat problems - are fans malfunctioning or are the insides coated with insulating dust?
No? OK, then maybe something in software is doing this. A "tail" of system logs may give a clue as to what was happening just before the hang (set Syslog MARK option if you aren't sure that stays running), as may tools like sar. A build up of unusual system activity prior to the hang might give clues as to its cause. If the hangs repeat, setting a "ps" running to log activity can help zero in on that should it happen again.
After all that though, these things are almost (*almost*) always hardware, and more often than not it is power related: bad power supplies are the most common cause I've seen. After that comes disk controllers and then motherboards, but nowadays I don't feel it's worth spending a lot of time chasing this sort of thing: move the system to new hardware as quickly as possible. If you then want to spend time investigating possibilities on the old hardware, at least you won't be interfering with normal business. However, given the cost of hardware vs. the cost of labor, even that may not make sense: accept that the whole thing was mysterious, do whatever you need to do to protect any confidential data, and move on. Maybe some parts can be recycled or maybe the machine can move down to less important use, but the cost of messing around with it in its original role just doesn't make sense.
About the Author: A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com
|