Announcement

Collapse
No announcement yet.

2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is failing?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is failing?

    Hi all,
    With a new job i've inherited a slightly faulty network, last couple of weeks has been just putting out small fires everywhere. Now to a major point i've noticed in my weeks here.
    Replication between the 2 DCs are not functioning to well..
    And it has not since 2007-09-21 . Hence tombestone lifetime is well...way over.

    Now i'm trying to figure out which of the 2 DCs i should just demote with /forceremoval tag, and on which I should clear the metadata with the ntdsutil.
    So that I can later on re-promote and start all over.

    The 2 DCs:
    Typh (Primary DC i'm being told by my new employer)
    Apo (Seconday DC)
    Merc (exchange, linked to Typh, just a FYI)

    How would i determine which of the servers is failing and if it's INBOUND or OUTBOUND replication that's failing and with that be able to decide which of the server to demote and which to clear the metadata off of?

    Many thanks for any advice.
    And I hope i've given you enough to point me in the right direction.

  • #2
    Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

    Hi,
    Before you go down the demotion route have a look at a few things to find out why the replication is failing.
    How are the FSMO roles allocated?
    Is the system time ok on both DCs? Anything logged on the event viewer?
    Are you running on a single site?
    Have a look at Replmon to get more info.
    Caesar's cipher - 3

    ZKHQ BRX HYHQWXDOOB GHFLSKHU WKLV BRX ZLOO UHDOLVH LW ZDV D ZDVWH RI WLPH!

    SFX JNRS FC U6 MNGR

    Comment


    • #3
      Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

      When you login to the workstations which DC are they authenticating to?
      Run the command set , you'll see what the logonserver variable is set to.

      You can check FSMO roles via running the command netdom query fsmo

      After that you can can see which DC holds the roles of the forest/domain as it seems you're a single site with two DC's.

      Are both DC's GC's? I'm assuming the Exchange server is not a DC.

      How is Exchange linked to Typh? Do you mean for "Recipient Update Services"? You can change where this points to. ESM > Recipients > Recipient Update Services > Then change the server it points to. AFAIK, the DC must be a global catalog server.

      You might be able to run Active Directory restore mode to do an authoritative restore from the good DC. However, the reason why they aren't communicating might not be the server that's tombstoned from 2007! It's most likely primary machine. Also, it's MS best practice to NOT restore a DC that's holds any of the FSMO roles, remember it as DRIPS I forget who gave us that acronym but THANKS! Domain Naming Master, RID Master, Infrastructure Master, PDC Emulator, and Schema Master. DS are the forest roles and RIP are the domain roles.

      Replmon will help but so will Repladmin.
      GoogleFu is strong with this one ^

      Comment


      • #4
        Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

        Originally posted by L4ndy View Post
        Hi,
        Before you go down the demotion route have a look at a few things to find out why the replication is failing.
        How are the FSMO roles allocated?
        Is the system time ok on both DCs? Anything logged on the event viewer?
        Are you running on a single site?
        Have a look at Replmon to get more info.
        *FSMO Roles are all on Typh. Checked and double checked.
        *System Time is identical to the second.
        *Single site, Yes, just these 2 DCs for 1 domain, an exchange and a fileserver.

        Replmon, running as of now, but i don't quite know what i'm looking for.
        Some guidelines would help me a lot on this.

        Thanks for you advice.

        Comment


        • #5
          Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

          Originally posted by stamandster View Post
          When you login to the workstations which DC are they authenticating to?
          Run the command set , you'll see what the logonserver variable is set to.

          You can check FSMO roles via running the command netdom query fsmo

          After that you can can see which DC holds the roles of the forest/domain as it seems you're a single site with two DC's.

          Are both DC's GC's? I'm assuming the Exchange server is not a DC.

          How is Exchange linked to Typh? Do you mean for "Recipient Update Services"? You can change where this points to. ESM > Recipients > Recipient Update Services > Then change the server it points to. AFAIK, the DC must be a global catalog server.

          You might be able to run Active Directory restore mode to do an authoritative restore from the good DC. However, the reason why they aren't communicating might not be the server that's tombstoned from 2007! It's most likely primary machine. Also, it's MS best practice to NOT restore a DC that's holds any of the FSMO roles, remember it as DRIPS I forget who gave us that acronym but THANKS! Domain Naming Master, RID Master, Infrastructure Master, PDC Emulator, and Schema Master. DS are the forest roles and RIP are the domain roles.

          Replmon will help but so will Repladmin.
          Sorry did not see your reply until now.
          Set on workstations are well 50/50 "LOGONSERVER=\\TYPH" and "LOGONSERVER=\\APO" Network topology wise, i'm physically closer to Typh, whilst clients on the other side of the building where Apo is located is closer and then have APO as LOGONSERVER..
          If that makes sense?
          (It's just one /24 subnet, big building, but only 60 employees)

          As said on previous post, FSMO roles are all on Typh.

          Both DCs are GCs yes.

          Exchange is NOT a DC.
          And yes I meant that it's linked to TYPH for "Recipient Update Services".

          Thanks.
          Last edited by noternet; 17th September 2009, 14:18.

          Comment


          • #6
            Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

            Have a look at this about replmon (Google for more) : http://www.mcmcse.com/microsoft/guides/replmon.shtml

            Any FRS or NTDS errors logged in the event viewer?
            Also try to post a DCdiag and Netdiag
            Caesar's cipher - 3

            ZKHQ BRX HYHQWXDOOB GHFLSKHU WKLV BRX ZLOO UHDOLVH LW ZDV D ZDVWH RI WLPH!

            SFX JNRS FC U6 MNGR

            Comment


            • #7
              Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

              Originally posted by L4ndy View Post
              Have a look at this about replmon (Google for more) : http://www.mcmcse.com/microsoft/guides/replmon.shtml

              Any FRS or NTDS errors logged in the event viewer?
              Also try to post a DCdiag and Netdiag

              Event Viewer Errors on TYPH:
              Warning NTDS Replication, EventID: 2092
              FSMO Role: DC=GNB,DC=com

              Warning NTDS Replication, EventID: 2092
              FSMO Role: CN=Infrastructure,DC=GNB,DC=com

              Warning NTDS Replication, EventID: 2092
              FSMO Role: CN=RID Manager$,CN=System,DC=GNB,DC=com

              Error NTDS Replication, EventID: 1864
              This is the replication status for the following directory partition on the local domain controller.
              Directory partition:
              DC=GNB,DC=com

              Error NTDS Replication, EventID: 2042
              Time of last successful replication:
              2007-09-21 10:44:42

              Will post DCDIAG and NETDIAG results shortly.

              Comment


              • #8
                Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

                As promised attached are
                DCDIAG and NETDIAG from both DCs TYPH and APO


                APO_dcdiag.txt

                APO_netdiag.txt

                TYPH_dcdiag.txt

                TYPH_netdiag.txt

                Comment


                • #9
                  Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

                  start by first installing 'replmon' from the W2k3 resource kit...it should show you the way..
                  http://technet.microsoft.com/en-us/l...54(WS.10).aspx

                  Comment


                  • #10
                    Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

                    It looks like TYPH has been offline for quite a while (More than the tombstone life anyway) Tha's created lingering objects in AD and as a result replication has been stopped. You can try Repadmin /removelingeringobjects on both DCs and retry to replicate http://technet.microsoft.com/en-us/l...10(WS.10).aspx

                    Or Demote TYPH/clean AD /repromote TYPH.

                    A couple af things to consider after though:
                    I'd think about increasing the Tombstone lifetime TSL.
                    Double check the WINS server as I noticed there was a connectivity error to it.
                    Caesar's cipher - 3

                    ZKHQ BRX HYHQWXDOOB GHFLSKHU WKLV BRX ZLOO UHDOLVH LW ZDV D ZDVWH RI WLPH!

                    SFX JNRS FC U6 MNGR

                    Comment


                    • #11
                      Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

                      Originally posted by L4ndy View Post
                      It looks like TYPH has been offline for quite a while (More than the tombstone life anyway) Tha's created lingering objects in AD and as a result replication has been stopped. You can try Repadmin /removelingeringobjects on both DCs and retry to replicate http://technet.microsoft.com/en-us/library/cc757610(WS.10).aspx

                      Or Demote TYPH/clean AD /repromote TYPH.

                      A couple af things to consider after though:
                      I'd think about increasing the Tombstone lifetime TSL.
                      Double check the WINS server as I noticed there was a connectivity error to it.
                      Will attempt this tomorrow morning. Going home to celebrate girlfriends b-day.

                      Repadmin /removelingeringobjects
                      should that be ran with any specific arguments?
                      Had a quick browse on what just repadmin gave me and, well there's alot of arguments one can use
                      Any guidelines would be helpful.

                      Many thanks all for your help!

                      Comment


                      • #12
                        Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

                        Alrighty gentlemen a update.

                        Logged in to APO and ran:
                        repadmin /removelingeringobjects TYPH apos_GUID dc=GNB,dc=COM /advisory_mode

                        TYPH event log got a couple of 1946 EventIDs and a 1942 telling me it's examined and verified 12 objects.

                        If i run it the other way around, i.e.
                        Logged in to TYPH running:
                        repadmin /removelingeringobjects APO typh_GUID dc=GNB,dc=COM /advisory_mode

                        APO event log just tells me 1938 that it's begun verification of lingering objects, and then a 1942 telling me that 0 objects examined and verified.

                        Should i now go ahead and run this without the /advisory_mode tag, just as i did on APO, or should I run it on BOTH DCs but swap the destinationDc and SourceDCGUID around?
                        And then regedit to allow replication with divergent and corrupt partner?

                        Or have i completely lost the plot?

                        Again, many thanks for your assistance.

                        Comment


                        • #13
                          Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

                          Run it on APO without advisory mode

                          But you also want to do what's described here
                          http://technet.microsoft.com/en-us/l...8WS.10%29.aspx
                          http://technet.microsoft.com/en-us/l...8WS.10%29.aspx

                          Read up on enabling strict replication consistency.

                          This is the one for the out of date GC
                          http://support.microsoft.com/kb/314282

                          Also a nice read
                          http://blogs.dirteam.com/blogs/jorge...g-objects.aspx
                          GoogleFu is strong with this one ^

                          Comment


                          • #14
                            Re: 2 DCs windows 2003srv R2 SP2, Replication failure, determine which server is fail

                            All,
                            Thank you for your kind and knowledgeable input in this issue.
                            It is now resolved, replication is function properly after lingering objects where removed and server allowed to replicate with corrupt partner once.

                            Users didn't notice a thing.

                            Again, thank you all for your assistance.

                            Comment

                            Working...
                            X