Announcement

Collapse
No announcement yet.

DR Options for RAID 5 recovery? 2/3 Failed Disks

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DR Options for RAID 5 recovery? 2/3 Failed Disks

    Hello everyone...
    I have a pretty dire situation at one of our locations.

    I'm not sure exactly what happened, because originally the (2 month old) APC UPS went out, then both servers were reporting RAID failures. Luckily, one server just had 1 drive bad, unluckily, it was an old server that wasn't in user anymore and was only still in place to ensure nothing on it was still needed.

    Unfortunately, the other server, the recently rebuilt...production server...had two drives fail. Hence it is not bootable. I have a couple of questions for the RAID experts...

    1. I do not want to do anything which may compromise the data, so is there any reason to run RAID card diagnostics from the PERC3 BIOS? Is there any chance that this may bring the drive/s back online?

    2. Keeping in mind that the disks need to be in the same slots...I'm going to try and have them reseat the drives tomorrow when I talk to them, any other suggestions to check on the server either physically or logically (i.e. any specifics on the DELL PERC3 card)?

    3. If this were a test environment, I'd be quick to try out this software which was recommended on an old thread here...Raid Reconstructor (http://www.runtime.org/raid.htm), any advice on software such as this? But my real question is here, what clean labs have you used and would recommend? I know they are costly...

    Thanks for any advice,
    ~Kara
    'What we do not make conscious emerges later as fate.' Carl Jung

  • #2
    Re: DR Options for RAID 5 recovery? 2/3 Failed Disks

    In a RAID5 configuration, no single drive holds both the complete data and parity information so if 2 drives are in fact dead it's most likely that you'll lose data. It may be possible to recover some data using a data recovery company but it's likely to be very expensive. Do you not have current backups of the data? As for your questions:

    1. Running the RAID card diagnostics shouldn't affect the data on the drives, so if it does bring the drives back online you're back in business.

    2. Call Dell technical support before reseating the drives or swapping slots.

    3. I've never used third party software to recover a failed RAID configuration, so proceed at your own risk. At this point to have little to lose.

    Comment


    • #3
      Re: DR Options for RAID 5 recovery? 2/3 Failed Disks

      Thanks for your reply. We'll probably end up shipping the drives off for recovery, I was just hoping for some more advice and possible first hand experience/recommendations with the data recovery companies.

      As for the backups, don't get me started...this is one of those "series of unfortunate events" type of situations, VERY unfortunate events.

      Thanks again,
      ~Kara
      'What we do not make conscious emerges later as fate.' Carl Jung

      Comment


      • #4
        Re: DR Options for RAID 5 recovery? 2/3 Failed Disks

        there is no need to ship anything for recovery
        what you need to do is perform a re-tagging. any dell server supporter can guide you through that.

        basically re-tag is a procedure, where you clear the config from the raid controller, and re-create the raid array, without initialising. they you force one of the drives offline (the disk that failed first, so it has the older parity records), and boot.
        after the system boots, you can assign the offline disk as hotspare, and it will kick into the degraded raid array and rebuild.
        for that you will need the raid controller logs of course, in order to see which disk failed first.
        otherwise guesswork will take place
        btw, if you guessed the first failed disk incorrectly, and the system does boot, but starts off by running scandisk IMMEDIATELY cancel the scandisk run, and repeat the procedure with the other failed disk

        to play it safe, DO CALL SUPPORT, and don't do it yourself.
        ________
        Free xxx
        Last edited by DYasny; 6th March 2011, 19:11.
        Real stupidity always beats Artificial Intelligence (c) Terry Pratchett

        BA (BM), RHCE, MCSE, DCSE, Linux+, Network+

        Comment


        • #5
          Re: DR Options for RAID 5 recovery? 2/3 Failed Disks

          Hey thanks for the reply!

          I actually JUST got back in my home office from the remote site. I went out there to try and make sure sending them off was the only option...luckily it wasn't.

          Here's what I did:
          Since I was 99% confident there was no physical damage to the disks/raid card, and the issue was directly related to the degraded state of the drives due to multiple power/UPS issues, I was also confident that the data was intact.

          After first silencing the most annoying alarm you'll ever hear...and disabling auto-rebuild, I forced online one of the offline drives. Since the degradation happened so quickly in succession, I was confident that either offline drive would have the most current data.

          A quick try to boot proved useless, the Windows kernel executable couldn't be found, but hey, at least I didn't get the 'no system disk' error!! I booted into WinPE and was able to view the partitions/directory structure/files. Mapped a drive, transferred files to a temporary machine, and we're almost back in business! The rest of the process is going to be done remotely.

          Thanks for the quick replies.
          ~Kara
          'What we do not make conscious emerges later as fate.' Carl Jung

          Comment


          • #6
            Re: DR Options for RAID 5 recovery? 2/3 Failed Disks

            simply forcing a disk online is very dangerous. make sure the data is indeed intact. also, the action plan I provided would have simply put your server back in action, instead of a reinstall being needed.

            anyhow, this sort of thing happens not only because of power failures, but mainly because people never keep their servers updated, and never realise hardware updates are as (or even more) important as software.

            bad firmwrare on the drives and the controllers is known to be causing scsi timeout, punctured arrays and lots of additional pains. so before you put the server back in production, make sure it's up to date
            ________
            VAPOR GENIE VAPORIZER
            Last edited by DYasny; 6th March 2011, 19:13.
            Real stupidity always beats Artificial Intelligence (c) Terry Pratchett

            BA (BM), RHCE, MCSE, DCSE, Linux+, Network+

            Comment


            • #7
              Re: DR Options for RAID 5 recovery? 2/3 Failed Disks

              Great, thanks for the advice.

              I like the thought of getting the server booted up and back in prod without reloading. We actually reloaded an extra server w/ new drives (updated firmware of course) and put server 2008 on it this time. It provided us a good chance to upgrade their location as we are now loading new servers w/ 2008.

              All in all, it was the best emergency experience I've had
              ~Kara
              'What we do not make conscious emerges later as fate.' Carl Jung

              Comment


              • #8
                Re: DR Options for RAID 5 recovery? 2/3 Failed Disks

                well, generally with high end brand name servers, and a backup available, emergencies should not be really hard on you

                happy sysadmin day, btw
                ________
                Smoking kills
                Last edited by DYasny; 6th March 2011, 19:13.
                Real stupidity always beats Artificial Intelligence (c) Terry Pratchett

                BA (BM), RHCE, MCSE, DCSE, Linux+, Network+

                Comment

                Working...
                X