Hard Drive Failure
As a technician, I would estimate one of the scarier things to be told is "your hard drive has crashed"... a message which I give countless times a month. Normally, it's not as dire as it seems, but the message is still the most effective I can give. While the specifics vary, the general situation is this: I respond to a client reporting a "computer that turns off as soon as it shows the Windows XP logo". Upon arriving I disable the automatic reboot setting (this is made possible at the Advanced Boot Options Menu thanks to a recent Microsoft Update) and am greeted with a blue screen message along the lines of "UNMOUNTABLE_BOOT_VOLUME" or "NTLDR Is missing or corrupt".
Stop. At this point, I know only one of two things: a) The NT Boot Sector is corrupt or missing; b) The MBR Is corrupt or missing; In and of themselves, neither one is indicative of a crash, yet I'll inevitably tell my client they are. Why? Well, before demonizing me, realize that the basic reason is to benefit the client.
Hard drive crashes are something most people feel they can relate too. They know it means nothing they did caused the problem, which in many cases is just as important as the fact that the problem is resolvable. Ultimately, it gives them enough information to provide a brief synopsis of the problem without overburdening them with the gory details. The unfortunate tradeoff for this solution is the fear of data loss. I know, it doesn't seem fair, but based on consumer expectations, it's easier for both myself (who doesn't have to have a client watching my every move) and the client (who at least feels like they know whats going on).
The Symptoms
Hard Drive faiulre is not only one of the most potentially detrimental problems I see, it's also the hardest to spot. Allow me to tell two stories to demonstrate my point. While both ended in Hard Drive failure (and fortunately complete data recovery), only one was caught before the failure occured.
SITUATION ONE: An irregular client of mine called reporting printing trouble. After walking him though the common troubleshooting steps over the phone, I agreed to come out. I was greeted by one of the strangest printing errors I had ever seen. The Windows Print Spooler service absolutely refused to stay active. The couple of times I did get it working it didn't work properly. Printer drivers would report strange errors, anything that did print was garbled, nothing really seemed to be working.
Finally, I decided to run a slew of system tests. My intention was actually to isolate a possible motherboard or cable error. Luckily for the client, included in that standard set of tests was a Hard Drive integrity test. When the drive gracefully failed, I realized what had been happening all along. Immediately, we RMA'ed the drive, backed up the data, and rebuild the machine. The next day, printing was up and running.
SITUATION TWO: One of my client's is rather problematic. Since she decided to replace her desktop with a Dell Lattitude (on account of problems with the desktop), we have been there to fix something frequently. The problems range from not knowing there's a physical "On/Off" switch for a wireless adapter to not being able to connect to the network.
This particular case a connectivity issue; one that started immediately after another technician (not affiliated with my company) had tried to connect the laptop to home wireless. Because the computer is a bussiness machine first, and then a personal computer, anything the other tech had done seemed null and void.
I sat down, ran ipconfig and immediately saw there was no default gateway set. Unusual, especially considering another tech had just serviced the machine. Not worrying about it too much, I statically configured the gateway, IP, mask, and DNS servers. Voila, the machine was working.
The next morning, like clockwork, we received a call from the same client. Her hard drive had just crashed and she couldn't work. Incidentally, this was the second hard drive failure in this machine since purchase, and we ended up RMA'ing the entire box.
THE ANSWER: So, what caused the strange symptoms in both cases? Well, the answer to that has to do with what actually happens when a hard drive fails. A hard drive is a magnetic disk divided both pyhsically and logically into sectors. A physical sector is, without getting to complicated, a definition of a magnetic region. When a physical sector goes bad, it becomes entirely unusable, and any data stored on it becomes inaccessible.
Conversely, a logical sector is a file system's (eg FAT, EXT3, NTFS, WFS, EFS, etc) description of a where data is stored. when a logical sector goes bad, it's really just the file system information having a problem. Tools such as chkdsk can (and do) often fix these problems. When they don't, performing a low-level format (indiscriminately writing all 0's then 1's across a drive) almost always recovers them.
So, when a Windows based hard drive fails, the data sectors start becoming inaccessible. Being the "user-friendly" operating system, Windows tries to compensate. In the case of the printer, when the service became inaccessible (physical bad sectors existed where the services files were stored), windows reported it as failing. With the networking, the data on the gateway, DNS, etc was inaccessible, so Windows reported it as missing (unset).
While this may make things easier for a user in the short run, in the end they're relatively useless.
The SolutionKeep a regular backup. In a future entry, I'll be covering the usage of Microsoft's free backup utility "NTBackup". When used with the built in Windows Task Scheduler, this utility gives you the ability to automate a daily backup of any file, every file, your entire file system, or even the system state.
My only other advice is to be attentive. While you may be graced with an odd clicking sound prior to a crash, you're just as likely to only get some unexplainable behavior. Most computers come with built-in diagnostic software these days. Check your documentation to see if you have a "drivers and utilities" cd. You might even have a built in diagnostic boot sequence, usually accessed via F12 or F10 at boottime. Maxtor, Western Digital, and Seagate also provide free tools for use with their hard drives.
Hard drive failure suck, no question, but with appropriate care, they can be made to cause minimal damage.

0 Comments:
Post a Comment
<< Home