Announcement

Collapse
No announcement yet.

Repairing a very unstable server 2003 setup

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repairing a very unstable server 2003 setup

    Morning all.

    Long story short, for the last 2 weeks our sister company have been without reliable IT facilities. The servers have to either be rebooted, or have services stopped and restarted, every 20 minutes to keep things running. The 2 DCs are running Windows 2003 Standard, and have never had any updates installed. Until recently they were also never shut down. The previous admin told the new guy "don't shut those servers down - bad things happen" and scarpered. Since then, we've been planning a full overhaul of the entire system up there.

    Anyway, we are now being forced to act quickly and as such are planning to do something about this on Monday night. Everything I've read about the errors in question (Userenv 1030 and 105 point to the problem lying with the servers, rather than with AD.

    There are 2 DCs, both run DNS, both are GCs and the FSMO roles are split between them. My plan is as follows:

    1) transfer all FSMO roles to the healthiest server (now known as server A), or seize them if we can't transfer
    2) remove DNS from the other server (server B) and make the necessary changes to reflect that server A is the only DNS
    3) demote server B to a member server, then remove from the domain
    4) disconnect server B from the network, reinstall Windows and updates
    5) join server B to the domain, promote to a DC and GC, install DNS
    6) allow time for replication and configure systems to point to server B as the sole DNS
    7) transfer (or seize) FSMO roles from server A to server B
    remove DNS from server A, demote to a member server and remove from the domain
    9) flatten and reinstall A
    10) join A back to the domain and promote to DC, install DNS, make necessary changes to reflect 2 DNS servers

    I've run through this in a lab setting (albeit with healthy servers) and it's gone fine. If anybody has any thoughts to offer though, please do so.

    We are quite prepared to face the possiblity that AD is corrupt - if that is indeed the case then we will rebuild the domain from scratch rather than trying to repair it.
    Attached Files
    Gareth Howells

    BSc (Hons), MBCS, MCP, MCDST, ICCE

    Any advice is given in good faith and without warranty.

    Please give reputation points if somebody has helped you.

    "For by now I could have stretched out my hand and struck you and your people with a plague that would have wiped you off the Earth." (Exodus 9:15) - I could kill you with my thumb.

    "Everything that lives and moves will be food for you." (Genesis 9:3) - For every animal you don't eat, I'm going to eat three.

  • #2
    Re: Repairing a very unstable server 2003 setup

    Hi,

    I presume you have already been through the usual drill, DCDiag, NetDiag, Rsop Anything reported on their results?
    Also Is the time sync'd on all the DCs?
    Caesar's cipher - 3

    ZKHQ BRX HYHQWXDOOB GHFLSKHU WKLV BRX ZLOO UHDOLVH LW ZDV D ZDVWH RI WLPH!

    SFX JNRS FC U6 MNGR

    Comment


    • #3
      Re: Repairing a very unstable server 2003 setup

      Can't run dcdiag on the unhealthier server - it reboots the server. Go figure.

      Time is synced fine. We've been through a long list of suggestions (see the EventID results for those errors) and the last one to try is reinstalling TCP/IP on the servers. So we're going to do that... by reinstalling everything else as well
      Gareth Howells

      BSc (Hons), MBCS, MCP, MCDST, ICCE

      Any advice is given in good faith and without warranty.

      Please give reputation points if somebody has helped you.

      "For by now I could have stretched out my hand and struck you and your people with a plague that would have wiped you off the Earth." (Exodus 9:15) - I could kill you with my thumb.

      "Everything that lives and moves will be food for you." (Genesis 9:3) - For every animal you don't eat, I'm going to eat three.

      Comment


      • #4
        Re: Repairing a very unstable server 2003 setup

        One important thought - what can we do to check the progress of replication after we promote the new DC?
        Gareth Howells

        BSc (Hons), MBCS, MCP, MCDST, ICCE

        Any advice is given in good faith and without warranty.

        Please give reputation points if somebody has helped you.

        "For by now I could have stretched out my hand and struck you and your people with a plague that would have wiped you off the Earth." (Exodus 9:15) - I could kill you with my thumb.

        "Everything that lives and moves will be food for you." (Genesis 9:3) - For every animal you don't eat, I'm going to eat three.

        Comment


        • #5
          Re: Repairing a very unstable server 2003 setup

          And if you run Dcdiag from the a remote server?
          Dcdiag /s <system> ?

          Personally I would first check if there is anything I could fix.
          Marcel
          Technical Consultant
          Netherlands
          http://www.phetios.com
          http://blog.nessus.nl

          MCITP(EA, SA), MCSA/E 2003:Security, CCNA, SNAF, DCUCI, CCSA/E/E+ (R60), VCP4/5, NCDA, NCIE - SAN, NCIE - BR, EMCPE
          "No matter how secure, there is always the human factor."

          "Enjoy life today, tomorrow may never come."
          "If you're going through hell, keep going. ~Winston Churchill"

          Comment


          • #6
            Re: Repairing a very unstable server 2003 setup

            Normally I would agree with you - I'm always in favour of pulling something to bits to find out what's gone wrong. In this instance though the clock is against us, and all of our research indicates that this is down to a known flaw in the product. But I appreciate your thoughts

            Originally posted by gforceindustries View Post
            One important thought - what can we do to check the progress of replication after we promote the new DC?
            Ok, so we can use ReplMon to force a replication and to check the status of it. Presumably that includes all user accounts, machine accounts, groups, group policies and WMI filters, etc etc. DNS replication can be checked in the DNS console.

            Does anybody have any points to offer, words of caution etc? I've tested the method several times in assorted labs, but of course if a lab goes down, then you deal with it. If you trash your company's AD, even with backups to restore from there's still going to be downtime.
            Gareth Howells

            BSc (Hons), MBCS, MCP, MCDST, ICCE

            Any advice is given in good faith and without warranty.

            Please give reputation points if somebody has helped you.

            "For by now I could have stretched out my hand and struck you and your people with a plague that would have wiped you off the Earth." (Exodus 9:15) - I could kill you with my thumb.

            "Everything that lives and moves will be food for you." (Genesis 9:3) - For every animal you don't eat, I'm going to eat three.

            Comment


            • #7
              Re: Repairing a very unstable server 2003 setup

              Another thing that might be worth doing if you can, is bringing another box on site and joining to domain, promoting to dc, DNS etc etc, at least then you know you have one "healthy" box that is free from the clutches of the dreaded "previous admin guy"

              Comment


              • #8
                Re: Repairing a very unstable server 2003 setup

                Afternoon all, thought I'd post back with a report. I have a few minutes to kill before I have to head out, so here's part 1.

                Long story short, we had to abort on Monday night. It turns out the licences weren't legal (see below), and the previous admin took the optical drives from the servers with him when he left. Why he would do that, we have no idea.

                The company has a volume licence, which covers them for a handful of workstations and 2 servers. According to the previous admin's documentation, it covers them for all of the computers and all of the servers. However the quantities in his documentation do not match those shown on eOpen. The licence keys also do not match, nor do they match the keys that were actually used to install the servers - each server has a different serial key, and Googling those keys returned numerous warez sites.

                After some debate, we grudgingly decided to go ahead with the reinstall as we would not be becoming any *less* compliant *, and then to raise the issue with the CEO in the morning (remembering that this was an overnight job) - at that point, we discovered the missing optical drives.

                Why the last admin did this, one can only guess. Our working theory is that he went to the directors with a price, and then found he wasn't able to meet it - rather than go back and explain his mistake, he pulled a fast one. Easy enough to do if you don't care about compliance, since the products in question don't require activation.

                Stay tuned for part 2 later today.

                * For reference, in the UK at least, if you are in the process of becoming compliant and can show supporting evidence, you are considered to be compliant. Your progress will be monitored by FAST, however to all intents and purposes you are not breaking the law. In this instance, we can show that we had no reason to doubt the legitimacy of the licences until we started to document the system, and that we are now taking steps to remedy this situation. FAST have been informed of the current status and will be visiting for an inspection in February. The previous admin has been invited in for a "little chat" with the CEO.
                Last edited by gforceindustries; 20th December 2008, 14:39.
                Gareth Howells

                BSc (Hons), MBCS, MCP, MCDST, ICCE

                Any advice is given in good faith and without warranty.

                Please give reputation points if somebody has helped you.

                "For by now I could have stretched out my hand and struck you and your people with a plague that would have wiped you off the Earth." (Exodus 9:15) - I could kill you with my thumb.

                "Everything that lives and moves will be food for you." (Genesis 9:3) - For every animal you don't eat, I'm going to eat three.

                Comment


                • #9
                  Re: Repairing a very unstable server 2003 setup

                  How did part 2 go?!

                  Comment


                  • #10
                    Re: Repairing a very unstable server 2003 setup

                    Mankily.

                    The FSMO roles were split between the 2 DCs. We transferred them all to one, demoted the other and hoped that would work.

                    It did... for a bit. Then the errors came back. Lame. The error message in the event log pointed to a problem with the default domain group policy. Looking into that revealed a rather wonderful example of the previous admin's muppetry: every setting he had defined was all in that one GPO. The one that Microsoft tell you not to modify.

                    We also found that after a reboot, the Exchange server couldn't start the Information Store - seems it was having a problem finding itself a GC. Not surprising since the GC it used had just been demoted, for some reason the firewall on that server trashed itself at some point and the server no longer had network connectivity. Rectified that with a driver reinstall.

                    The demoted server has since been flattened and reinstalled and is now the sole DC. The other server was also flattened and left as the file server for now. So far, the system is stable.

                    We're still intending to do a full flatten and rebuild in January though - there's enough problems left with the system to justify it - the licencing issues alone would warrant that, and if we're going to buy licences anyway we might as well upgrade the other 3 servers to R2. Put together a shopping list to max out the servers with memory and hard drives, and had that approved so we'll be off to a good start. At present none of the servers have good memory configuraitons - they all warn you at boot-up and 2 require you to press a key to continue booting. Not ideal. 2 of them also have degraded RAID 5s.

                    A giggle for you: the previous fileserver (also PDC) had a single partition, the root of which was shared for public access. The BDC (also a fileserver) had 2 hard drives, no RAID. The apps server has impressive lag, even with 8 Xeon cores at 2.6GHz.
                    Last edited by gforceindustries; 22nd December 2008, 17:30.
                    Gareth Howells

                    BSc (Hons), MBCS, MCP, MCDST, ICCE

                    Any advice is given in good faith and without warranty.

                    Please give reputation points if somebody has helped you.

                    "For by now I could have stretched out my hand and struck you and your people with a plague that would have wiped you off the Earth." (Exodus 9:15) - I could kill you with my thumb.

                    "Everything that lives and moves will be food for you." (Genesis 9:3) - For every animal you don't eat, I'm going to eat three.

                    Comment


                    • #11
                      Re: Repairing a very unstable server 2003 setup

                      Blimey! Someone's going to be busy then!

                      Comment


                      • #12
                        Re: Repairing a very unstable server 2003 setup

                        Man, what a great job fixing that...
                        Marcel
                        Technical Consultant
                        Netherlands
                        http://www.phetios.com
                        http://blog.nessus.nl

                        MCITP(EA, SA), MCSA/E 2003:Security, CCNA, SNAF, DCUCI, CCSA/E/E+ (R60), VCP4/5, NCDA, NCIE - SAN, NCIE - BR, EMCPE
                        "No matter how secure, there is always the human factor."

                        "Enjoy life today, tomorrow may never come."
                        "If you're going through hell, keep going. ~Winston Churchill"

                        Comment


                        • #13
                          Re: Repairing a very unstable server 2003 setup

                          The side of IT I love best is coming into a company, seeing what a mess their last muppet made and doing it according to best practices. His setup consisted of four * servers, like this:

                          1) Dell PowerEdge 2950 - Exchange, EFS **, Fax

                          2) HP ProLiant DL140G2 - DC (GC & domain level FSMO roles), DNS, DHCP, AV console, WSUS, RRaS ***

                          3) Dell PE2950 - App server - Pegasus Opera & ProspectSoft

                          4) PE2950 - DC (GC & forest level FSMO roles), DNS, DHCP, UPS interface, File/Print, Backups

                          * There is a fifth server in the basement, where it has been for 18 months. The documentation states that this is the disaster recovery machine - it's a DC that can just be plugged in to the network and things will work again. Anybody else doubtful?

                          ** Email Forwarding Server. Why would you run your primary and secondary mailservers on the same physical box? Not virtualised, both under the same OS.

                          *** RRaS not used - the ADSL router plugs straight into the network, with its DHCP disabled. Each server has RDC setup to listen on different ports which are forwarded by the router.

                          Our plan is as follows:

                          1) Exchange & Fax

                          2) BDC (GC with no FSMO roles), WINS, DNS, DHCP, WSUS

                          3) PDC (GC & FSMO roles), WINS, DHCP, DNS

                          4) File/print, backups

                          5) RRaS/ISA, AV console

                          6) New server. We're replacing Opera & ProspectSoft at both sites with Microsoft Dynamics running on SQL Server. If we're paying 50k for software, then we want to spend 5k on proper servers to run them on, so both sites will be getting a new PowerEdge 2900 beast to run it on. Neither site has a server grunty enough for this role at present.
                          Last edited by gforceindustries; 23rd December 2008, 10:07.
                          Gareth Howells

                          BSc (Hons), MBCS, MCP, MCDST, ICCE

                          Any advice is given in good faith and without warranty.

                          Please give reputation points if somebody has helped you.

                          "For by now I could have stretched out my hand and struck you and your people with a plague that would have wiped you off the Earth." (Exodus 9:15) - I could kill you with my thumb.

                          "Everything that lives and moves will be food for you." (Genesis 9:3) - For every animal you don't eat, I'm going to eat three.

                          Comment


                          • #14
                            Re: Repairing a very unstable server 2003 setup

                            There are no PDC/BDC's anymore in Windows 2000 and up
                            Marcel
                            Technical Consultant
                            Netherlands
                            http://www.phetios.com
                            http://blog.nessus.nl

                            MCITP(EA, SA), MCSA/E 2003:Security, CCNA, SNAF, DCUCI, CCSA/E/E+ (R60), VCP4/5, NCDA, NCIE - SAN, NCIE - BR, EMCPE
                            "No matter how secure, there is always the human factor."

                            "Enjoy life today, tomorrow may never come."
                            "If you're going through hell, keep going. ~Winston Churchill"

                            Comment


                            • #15
                              Re: Repairing a very unstable server 2003 setup

                              Nope, but there is a PDC Emulator by PDC, I mean it holds the FSMO roles. In a single domain forest, I see little to be gained by splitting the FSMO roles between multiple servers - if one DC goes down so badly that we can't fix it, we can seize the roles - we'd have to reinstall the DC etc anyway.
                              Gareth Howells

                              BSc (Hons), MBCS, MCP, MCDST, ICCE

                              Any advice is given in good faith and without warranty.

                              Please give reputation points if somebody has helped you.

                              "For by now I could have stretched out my hand and struck you and your people with a plague that would have wiped you off the Earth." (Exodus 9:15) - I could kill you with my thumb.

                              "Everything that lives and moves will be food for you." (Genesis 9:3) - For every animal you don't eat, I'm going to eat three.

                              Comment

                              Working...
                              X