Intelli-SMART Failure Prediction Software
Self Monitoring And
Reporting Technology (SMART) is a major step to a truly fault tolerant
computing environment. As all of us know that the Hard Disk
Drives (HDD) in our servers, workstations, or even home computers hold
vast amounts of critical data. The HDD also has the greatest potential
for failure of any component in a system.
To date we could check a system for “soft errors” like cross linked files or lost clusters, we could scan for viruses and check the surface of the drives platters for defects, but there was really no reliable way to detect hardware failure. After the fact there are some diagnostics to check the drive out, but by that point, our precious data is gone.
SMART on the other hand, monitors the drive's hardware performance with drive specific tests on the controller. When a fault is detected, a notification is given that the hardware may fail and all data should be backed up immediately, and the drive sent for repair or replaced.
SMART compliant drives are the norm,
and have been shipping for several years now. SMART is defined and
endorsed by the Small Form Factor (SFF) Committee, which includes Compaq
Computer Corporation, Hitachi, Ltd., IBM Storage Products Company,
Maxtor Corporation, Quantum Corporation, Seagate Technology, Toshiba
Corporation and Western Digital Corporation among others. In fact, IBM
has shipped nearly 3 million SMART compliant drives, which have
logged over 20 billion hours. SMART is an industry standard
reliability prediction indicator for both IDE/ATA and SCSI HDD.
Basics of SMART
SMART is actually a failure prediction method based on a device varying from a defined set of thresholds. The idea is an industry wide continuation of PFA (predictive failure analysis) implemented and invented by IBM for its mainframe computers. The first company to implement SMART technology on desktop PC's was Compaq Computer with their DFP (drive failure prediction).
A drive that is SMART compliant has a series of tests embedded on the controller of the HDD. The data is constantly collected and monitored for variations within vendor specific thresholds. These tests are designed to predict the impending degradation or failure of a drive. For instance, if a drive is designed to spin at 3500 RPM and the manufactures threshold is +/- 100 RPM, the drive may be within threshold for a year, but as the drive ages the RPM's begin to fluctuate and before long are at 3300 RPM. Once that happens, an error is logged and a message is sent to the System Administrator and/or the user. The drive can now be scheduled for replacement before the condition gets worse and becomes catastrophic.
The various tests and thresholds that govern pass or fail are vendor specific and usually proprietary. The type of failures monitored by SMART include head and servo issues which result in read and seek errors, motor failure or bearing problems that result in spin up problems, excessive bad sectors and thermal testing. The faults are categorized into predictable and unpredictable, with the unpredictable usually being catastrophic. Unpredictable faults are normally electronics related or caused by static electricity or from handling. Testing and data collection can be in an on-line or off-line mode. In the on-line mode, data is collected during idle times. In the off-line mode the drive is required to respond to commands directly from the host and interrupt any operations.
Intelli-SMART has automatic settings for monitoring the drive in on-line status as well as test now button for immediate off-line testing and data collection where supported. In order to make the implementation of SMART practical, the system has to be able to alert the user, or in the case of a network, the system administrator of a potential failure. Previously, the only way to utilize the SMART technology is through the newer BIOS (at boot time) or with a simple desktop application that notifies the user of a failure.
Many of these utilities are manufacturer specific and run under DOS. We go a step further. Intelli-SMART runs natively under Windows, and has easy to use messaging options where notifications can be sent via MS Mail, Lotus Notes or Netscape MAPI. For online systems, messaging is available over the Internet allowing for alerts to be sent via email. Intelli-SMART also monitors all of the drives in your system or RAID array whether they are IDE/ATA, SCSI or both. (note that some RAID controllers will not pass through all of the SMART data, and only the volumes will be monitored).
LC Technology International License Agreement