On Sun, 3 Feb 2008, Igor Sobrado wrote: >> I have had 3 e7k100's fail with read errors in the same general >> area of the disk. > > How odd! The only way to get symptoms like the one you describe > is getting the surface temperature of the disk platters higher > than the Curie point. It should not happen.
No. Drives of the same model are zoned in the same way. It is not uncommon for certain areas of the disk to be particularly prone to errors for a particular model. For example, as the head moves towards the end of the disk (the inner radius), the sectors are spaced more closely, and right before the density switch to the next zone, the drive might be a bit marginal. > What about the new solid-state drives? > Are solid-state disks a better alternative on Soekris > computers than traditional drives? Traditional flash drives (including CF) have had real problems with write endurance and write speed. A new breed of flash drives is coming on the market; they use a significant microprocessors and RAM buffers (backed with supercaps and firehose dump to flash) to get over the speed problem for bursty workloads, and they use wear leveling to extend the write endurance. Today, such drives are beginning to heavily push into the laptop market. Search for recent press releases from Toshiba for an example. Also, there are boutique vendors such as STec who ship flash drives with ATA/SATA/SAS/FC form factors, that have fabulously good performance specs (but aren't cheap). We are at an inflection point, where a transition from disk to flash, and then from flash to newer solid-state technologies collectively coined "storage-class memory" is occuring. Things are changing. For now, disks are the undisputed cost/capacity leaders; flash is pushing where good speed, good reliability and low power consumption is important. For real reliability, I would use CF for now, until the newer wear-levelling flash drives become affordable. Hint: I run a consumer grade 2.5" drive in my Soekris, and I have good backups and a half dozen spare drives sitting in the drawer. If it fails, replacement and recovery would only take me a few hours. I haven't had the patience to set up my 5501 for CF boot, but it would be a good idea (with the spinning disk being only for throwaway data). > > This is good reading... > > http://research.google.com/archive/disk_failures.pdf > > There are a few more independent studies done. Do not have the urls at > > hand. Several studies came out of NetApp recently, they are also excellent. > On this report the authors write that "[the figure 4] shows that failures > do not increase when the average temperature increases. In fact, there > is a clear trend showing that lower temperatures are associated with > higher failure rates. Only at very high temperatures is there a slight > reversal of this trend." For the authors, slight increase means only 1% > at temperatures higher than 45 C. > > The authors end saying that "We can conclude that at moderate temperature > ranges it is likely that there are other effects which affect failure > rates much more strongly than temperatures do." All this is true, for the temperature ranges they studied. > So, temperature is not a source of disk failures. WRONG! For the disks that were mounted in google's data center, there was no clear correlation between higher temperatures and failure rates, but this does NOT extrapolate to the general case. DO NOT USE THE GOOGLE DATA AS AN EXCUSE TO RUN YOUR DISKS HOT, you'll be sorry. The storage research community has analyzed the google paper in much detail, but most of those discussions are not public. The google data is hard to compare with individual deskside computers, the typical Soekris box, or regular data centers, because many of the google disks are running already much cooler than usual (few disks outside the extremely well-cooled mega-datacenters that the likes of Google and Livermore use run at temperatures as low as 20 to 25 degrees). Part of the Google data is actually explained by running the disks cooler; the ideal temperature FOR THE GENERATION OF DISKS THAT DOMINATE THEIR SAMPLE seems to be 30-40 degrees. If you discuss this with the experts from Seagate, Hitachi, Maxtor etc. you'll find that the probably cause is spindle lubrication being a tad too viscous. Furthermore, the Google data set is difficult to analyze, because it contains many generations of disk drives (none or few are really modern, or 2.5" disks), and the change from generation to generation is highly correlated with changes in data center environment (Google's data centers have become larger and better cooled), and the way disks are mounted. > There are other sources > of failure like the number of power cycles of the drive and power-on hours > (in my humble opinion, POH is more important on 2.5" drives than on 3.5" > ones.) This is indeed true. And don't forget vibration, which non-enterprise disks (commonly ATA and SATA disks) really hate too. But don't take the Google data too literally. If you are interested in this topic: FAST (the file system and storage conference, where the Google paper was presented) is coming up in San Jose at the end of February. I'll be there. The program shows that NetApp and CMU will present a new paper on disk errors. Expect serious discussions of this topic in the hallways. -- Ralph Becker-Szendy [EMAIL PROTECTED] (408)395-1435 735 Sunset Ridge Road; Los Gatos, CA 95033 _______________________________________________ Soekris-tech mailing list Soekris-tech@lists.soekris.com http://lists.soekris.com/mailman/listinfo/soekris-tech