No announcement yet.

Best way to swap Online (Errors) disk running Software Raid 1 on Server 2003 R2 x64?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to swap Online (Errors) disk running Software Raid 1 on Server 2003 R2 x64?

    Over the weekend I am getting my hands on an old file server that is currently running Windows Server 2003 R2 X64. It currently has 2 500GB SATA II hard drives and it appears its running software Raid 1 under Windows Server. The server locked up at the start of this week. After rebooting it, it was running slow because and according to disk management it was saying "regenerating" for the Raid 1. After a few days it completed and it seems to be running better.

    However according to Disk Management, Disk 0 is currently saying Online (Errors) and Disk 1 is saying Online. It also says Healthy (At Risk) on the partitions. The server is operating and hasn't had any issues/crashes since this happened. Funny enough when I check the smart data both hard drives it comes back fine. However when I used HDtune, it said something about CRC errors reported for Disk 0. Also under device manager, the Primary IDE Device 0 is running in PIO mode where Device 1 is running at Ultra DMA Mode 5. I did delete the Primary IDE device and restart the server which restored Device 0 back to Ultra DMA mode 5. But after an hour of it running, it went back to PIO mode. So I purchased a same matching replacement hard drive to swap in to replace the drive that is indicating errors.

    However this is actually the first time I am dealing with an older server OS running Software Raid 1. I plan to do a full backup of the server using Easeus Todo Backup Advanced Server before I touch anything. It has been using this anyway for its backups.

    Now this is where I am unsure what to do being its Software Raid. Normally with hardware raid I would just unplug the failed drive and connect the new one and it would rebuild the array. But being Software Raid, do I have to first go into disk management and select "Remove Mirror" or "Break Mirrored Volume" before shutting it down and swapping in the other drive? Also I am concerned that being it seems the primary (Disk 0) drive is the one that is failing, if I break the Raid will it still boot up to Windows off of the good Disk 1 2nd drive after breaking the Raid? I would assume that after I break the Raid, I have to shut down the server and connect the new replacement drive then boot it up and rebuild the raid again?

    Also I had planned to select the "Remove Mirror" option instead of Break Mirrored Volume. Being with the remove Mirror option, it would let me pick the good Disk 1 drive to keep the data and then delete the defective Disk 0? I just want to make sure that the server uses Disk 1 to boot from after a restart until I rebuild the mirror with the new drive!

    The reason I am thinking of removing the Raid first in disk management is that so it updates the Boot.ini file? Being correct me if i am wrong I think if I was to just disconnect failing Disk 0, and try to boot to windows it wouldn't work? Below is currently what the BOOT.ini file looks like on the server

    [boot loader]
    default=multi(0)disk(0)rdisk(0)partition(1)\WINDOW S
    [operating systems]
    multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windo ws Server 2003 Standard x64 Edition" /fastdetect /NoExecute=OptIn

    multi(0)disk(0)rdisk(1)partition(1)\WINDOWS=Boot Mirror C: - secondary plex

    I just want to make sure I take the right steps to do this. Being I don't want to turn this project into a nightmare for myself. This is only a temporary fix until they have a budget for 2016 to put towards a new server replacement. I hope someone can shed some light!

  • #2
    First, what an excellent post. All the info and set out so it can be easily read. Well done!

    My memory being what it is and that the last time I did this was 2003 or 04, I think I broke the array and then added the new HDD (or I just ripped out the failed drive and relaced it with the new one and the saga that followed). Make sure the drive you are adding is the EXACT same size or larger. I do remember ordering a 40GB HDD to replace my failed drive only to find a 40GB Quantum was not the same size as a new 40GB Quantum. Can't remember who they bought out but the rebadged drives were a MB or two smaller than the original ones and it was in the region of a MB or two. I'm now pretty sure I did this on an NT4.0 Server (yes, and 18 months later the other original drive failed) and not the 2003 version.

    As for the rebuild process, that I really can't remember but it must have been simple coz I managed to complete it. Just created a new RAID making sure I didn't mirror the wrong one. If I have stuffed up the process someone will be along shortly and correct me.

    Might also be worth running FSUTIL once they are synced and make sure the drive is not dirty though (re)building the RAID may do that for you. Full syntax is, if you don't know it:
    >fsutil dirty query c:

    After having those drives fail on me I then only purchased Servers that had Hot Swap capability on the Drives. Makes life a whole lot easier.
    Joined: 23rd December 2003
    Departed: 23rd December 2015


    • #3
      If I remember rightly, its been a long ling time since I done this, I broke the mirror, removed the faulty disk, put the good disk into slot 0 and it came back up successfully.

      Please bear in mind though its been around 12 years since I done this so the memory may be a little hazy.


      • #4
        Breaking the mirror will retain the data on both drives but make them independently editable. (you now have two non-mirrored partitions instead of one mirrored partition)
        Removing the mirror removes the drive you select from the mirror, deleting the partition on the removed drive. You are left with one non-mirrored partition.

        I would remove the bad drive from the mirror and then add the good drive.

        Network Consultant/Engineer
        Baltimore - Washington area and beyond