The Legend of the ‘Punctured Stripe’

Here where I work we have a dedicated IIS web server we’ll call IISServ.  IISServ is configured with a RAID 1 array and a non-RAID single disk for expanded storage of logs.  Awhile back, IISServ was moved down to a new spot on the rack it was in. Magically, after the move, the computer started reporting I/O and other CRC errors on drive C… 

I called dell and told them about the errors and they had me run a diagnostic on the whole RAID controller.  The diagnostic reported back errors on disk 0. Disk 0 was the first disk in the RAID1 array.  Dell shipped me a new disk and I swapped it out while thinking about how easy the whole process was. 

A few days later we received some more I/O and read errors that were canceling our shadow copy and remote backup process for IISServ.  After I ran some updates, IISServ failed to reboot properly and hung somewhere on the action of booting into the OS.  I removed the RAID1 drive that was not just recently replaced and the computer booted normally.  After the computer booted I re-inserted the bad drive.  I called Dell again and they had me run another diagnostic on the RAID array.  This time errors appeared on the second drive.  Dell sent me another good drive.  I hot-swapped the new drive into IISServ.

  After a day or two I still received more errors and the system state backup still failed.  I ran a chkdsk.  Chkdsk reported some errors, so I scheduled some downtime to do a chkdsk /r.  After the chkdsk /r a few sectors were flagged as ‘bad’ when I did a chkdsk again – great news, right?  STILL the error persisted and I assumed the only problem could be with the RAID controller it’s self.  The server is under warranty, so back to Dell I went again. 

  Dell had me run a reporting tool and send them some log information from the RAID controller.  They scanned the logs and found both disks 0 and 1 (the two in the RAID1) were reporting bad sectors in the exact same places.  Dell forwarded me up to their manager, or specialist, or whatever they have and he explained to me his version of the problem.  He called it a ‘Punctured Stripe’, which nobody in my office including me had ever heard of.  He explained the error was on the logical layer of the array and not on the physical disks.  He also explained to me that consistency checks are the preventative maintenance for this.  He also went on and explained to me that there was no application or process for repairing this and informed me the only way to repair the problem was to RECREATE THE ARRAY from scratch.  That’s right, destroy the entire array and all the data and start over.  I guess I should have ran more constancy checks, but even then he admitted, this problem can still occur. 

  “Punctured Stripe” with quotes came up with about 4 results.  One guy was ranting about how people use RAID 1 as their backup solution, which it is not intended to do.  Another search found someone that explained that RAID1 is only effective at preventing complete drive hardware failure, and is susceptible to passing corrupt data between drives.  I even found someone posting with the thread title ‘RAID1 is useless’, ranting about having the same problem we have.  I also went so far as to call someone who does hard drive recovery and ask him if he had ever heard of a ‘Punctured Stripe’.  He said he had never heard of it, but he had heard of corrupt data being replicated on the logical layer of a RAID1, effectively running the entire array and requiring it to be recreated. 

  I am trying to figure out how this happened, how we could have prevented it, and what good RAID1 is if corrupt data can ruin the array so easily.  From the looks of it, RAID1 isn’t such a great redunancy solution!

Update 6/14/07 - Found that Dell released a utility for dealing with this problem.

6 Responses to “The Legend of the ‘Punctured Stripe’”


  1. 1 lockness350

    I am gettinghte same error message on one of my servers. Does anyone have any information on this?

  2. 2 Brian Liston

    This is a strange one, punctured stripes , as the name suggest affect stripes and are usually confined to raid 5 or 50. Whats happening is that the controller is mapping a bad block onto, lets say, disk1. If the block is not mapped fast enough the controller can map the block to lets say disk 2, At first this may seem strange but when you think about it it’s necessary, if a disk really fails the controller needs to act quickly to maintain redundancy. So now we have a phantom bad block on disk 2, this is a common occurance, as already suggested controllers use a consistency check to prevent these sorts of issues. The reason they seem to appear on Dell equipment is that Dell did not automatically set their consistency check and left it up to the customer (there is a performance hit). Newer Dell controllers use Patrol Read to actively check for problem areas on the disk to prevent “Punctured Stripes” this is now enabled by default. There is a fix but it is more time consuming than backing up, reformating and restoring.
    In a RAID 1 situation, I would suggest failing disk 2, reconstruct disk 1 to a RAID 0, run a chkdsk /r, create RAID 0 using Disk 2 (make sure you initialize the disk), Use ghost to copy data from disk 1 to 2, delete the configuration on disk 1, format disk 1, reconstruct RAID 0 on Disk 2 to RAID 1 using disk 1.

  3. 3 Ryan

    This is correct to a degree…
    First I want to start by saying that a Raid 1 (Mirror) does not have a stripe and therefore can’t have a “punctured stripe”
    You will mainly see this in a Raid 5. Where 2 drives have the same bad block and therefore can’t rebuild because it doesn’t have enough information to properly calculate the parity information to put on the replacement disk.
    The problem you are seeing is similiar though. You have a bad block on your original (non replaced) drive. When the rebuild is taking place, it is hitting the bad block; trying to copy that “bad” information to the replacement disk. It flags the same block on the replacement disk as a “bad block”
    This will result in the problems you have been experiencing.
    As far as preventative maintance, the Dell rep was correct. Dell recommends a consistency check on your raid arrays every month to insure a healthy raid. It won’t help out so much with a raid 1 as it does a raid 5 because of the lack of a stripe/parity stripe. It will though; also check for bad blocks, in the event that it finds a bad block it will mark it as bad and remap it to an availiable location. Therefore preventing this problem.
    In this case, where the damage was already done. Depending on the raid controller (hopefully adaptec) “I” would have recommended hooking it up to an adaptec scsi card/raid controller and run a “Disk Verify” from within the controller. It will also re-map the bad block allowing the rebuild to take place without problems
    Hope I was of some help

  4. 4 gutseb

    This can happen on any raid card. It is true that raid 1 are more susceptible to this type of corruption. That is why we backup.

    The only thing you can do to prevent it is keep your drive firmware and controller firmware up to date. The newer firmware usually have better failure detection and can mitigate corruption. However it does not totally eliminate the possibility.

  5. 5 Rocco

    How can I get the utility that dell released. The link goes to a dell DSCE site.

    Any help would be great I have the same problem and the data to backup and restore is about 1 TB.

    Thank you

  6. 6 Brian Liston

    Dell use LSI controllers, the “Adaptec verify option” is called a consistency check in theses controllers, and can be carried out in the controller BIOS (Ctrl M, during POST), make sure the controller firmware is up to date.(if you are using a Windows OS make sure you update the driver before you apply the newer firmware).

    By default during normal operation the controlller continually scans the disks for errors, this functionality is called “patrol read” so you may see entries in your event logs for this, however if you know you have errors, run the consistency check from within the controller BIOS.

Leave a Reply