Announcement

Collapse
No announcement yet.

Site Replication Simulation

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Site Replication Simulation

    Hey all.
    I have a large Domain in one forest with over 30 DC's and over 20 Sites.
    I would like to redesign my site links since I get a lot of SCOM Alerts of slow replication that are over 45 min which supposed to be the maximum time by Microsoft if you take in consideration Maximum 3 hops + 15 min Inter-Site Replication time (all my Site-Links are 15min'), BTW - I do have the "Bridge All Site-Links" enabled on the IP Transport so that should cover the fact that all DC's could communicate with each other,I got no Network Limitation, meaning all site are connected with each other on a MPLS line and no FireWalls what so ever. So All DC's could be reached up to 3 Hops .The SCOM Alreats me some time up to 56 min, mostly on the DomainDnsZone partition and hardly any or even none on the Domain Data Partition.
    1st of all how come I get slow replication on the DNS Partition and not on the Domain Data Partition , is it not the same replication Topology?
    but that's besides the poin.
    What I am looking for is a tool that would simulate replication based on site links/cost , that let say I make a change in a DC I would be able to see how long it would take reaching every DC.
    then I can play with the site links/cost (in simulation mode) and run the test again and see the results that are most suited for my needs.
    I don't mind it being a tool that need to be purchased.
    Last edited by Akila; 18th May 2008, 16:16.

  • #2
    Re: Site Replication Simulation

    Although I never tried but seems to be the one that could be useful for you:

    http://www.microsoft.com/downloads/d...displaylang=en

    Regards,
    Kapil Sharma
    ~~~~~~~~~~~~~
    Life is too short, Enjoy It.

    Comment


    • #3
      Re: Site Replication Simulation

      Originally posted by kapilsharma11 View Post
      Although I never tried but seems to be the one that could be useful for you:

      http://www.microsoft.com/downloads/d...displaylang=en

      Regards,
      This tools is the ADMAP, I know that tool , it's not a simulation program it is only a drawing of your AD topology and creates a vsd file (Visio) for nice display.

      Comment


      • #4
        Re: Site Replication Simulation

        Any tool would be fine even if it's not a free tool.
        Any Ideas?

        Comment


        • #5
          Re: Site Replication Simulation

          There used to be a tool called Age of Directories from Compaq (do not remember whether it was free) and I believe it is called ADTV (part of OvO). It had the capability to simulate the topology.

          As for your topology, can you shed some light on the sites layout ? Do you have datacenters and branch offices or all the sites are equal ?
          Guy Teverovsky
          "Smith & Wesson - the original point and click interface"

          Comment


          • #6
            Re: Site Replication Simulation

            We got few Data center and the rest are small sites, the largest small site contains about 600 Users give or take.
            some of the sites have more then one DC but most of the smaller sites have on DC.
            all the Sites are connected using the MPLS Cloud method from AT&T meaning all sites are connected with each other through the cloud.
            the only bottle neck is the site's out going WAN line to the cloud ,which in away makes it very flexible reconfiguring Site Links.
            "Bridge all site links" is enabled meaning every site could communicate with each other if needed.
            all of my site links are configured for 15 min' replication schedule.
            The only thing is I can't understand how come replication is taking over 45 min' if there are no more then 3 hops and the most bizarre thing is that it happens only in the DomainDNSZones partition and on on the Domain Data partition, is it not the same replication topology?
            Last edited by Akila; 18th May 2008, 22:00.

            Comment


            • #7
              Re: Site Replication Simulation

              It all depends on the load of the site bridgeheads. Site bridgehead can replicate (pull) only from a single DC at a time. This means that if you replicate after 15 minutes and the destination DC is busy pulling from some other DC, you will have to wait till it finishes what it is doing. Also take into account the following:

              SITE_A_DC1 <-- SITEB_DC1 --> SITE_C_DC1

              the following scenario might show where the additional delay is coming from:

              in theory ISTG in site B should designate a DC to act as site's bridgehead server. But the thing is that each replication partition has it's own topology. In the case of single domain, there are 3 topologies: domain, enterprise (config+schema) and GC (and if you have application partitions, each will have it's own topology). As a rule of thumb, enterprise topology will try to follow domain topology where possible. Same for GC.

              But, this is not promised and heavily loaded bridgehead or, topology containing DCs that do not host required partition (i.e.: do not have DNS installed, which makes them enroll into DomainDNSZones/ForestDNSZones), can result in introduction of additional bridgeheads in the same site (for each replication partition respectively).
              Suppose we have this happening and SITE_B_DC1 is bridgehead for domain partition and SITE_B_DC2 is bridgehead for the rest.

              But lets look at a replication of a single partition:

              1) 15 minutes pass and SITE_A_DC1 wants to replicate with SITE_B
              2) SITE_A_DC1 waits till SITE_B_DC1 is done pulling from other DCs
              3) SITE_B_DC1 is done after 1 minute and pulls from SITE_A_DC1 (we are at 16 minutes)
              4) after 15 minutes SITE_C_DC1 wakes up and starts pulling from DCs that do not belong to site B. This takes 2 minutes (we are at 33 minutes)
              5) SITE_C_DC1 pulls domain partition from SITE_B_DC1
              6) Domain partition has been replicated from A to C in 33 minutes

              Now, in your case, if DomainDNSZones partition is the one that keeps popping up, I would check whether all DCs are DNS server and are enrolled into this partition. Some DC in the middle that does not have DNS installed and does not have a copy of DomainDNSZones can cause the replication topology behave differently from what you would expect.

              Note also that though you are bridging site links, the link costs are still taken into account, resulting in fact that there is a direct link between A and C, but it has twice the cost of a A<->B

              One thing you have not mentioned (or maybe I have missed) is how site links and site costs are configured. Do you have a link for each branch connecting it to hub (hub and spoke) or do you have a partial/full mesh ?
              Last edited by guyt; 19th May 2008, 00:06.
              Guy Teverovsky
              "Smith & Wesson - the original point and click interface"

              Comment


              • #8
                Re: Site Replication Simulation

                Btw, have just noticed that you are from Israel and given the number of sites/DCs I can count 2-3 companies that will fit the size.
                If you have Premier support contract, you should be undergoing sometime an ADRAP. The PFEs that conduct ADRAP are VERY knowledgeable. If you get a chance, use the time they are on site to squeeze out more info and how to optimize the topology.

                I do not have enough knowledge about your current setup, but among other things I would be looking at enabling change notifications on the site links if the fast convergence is required. But take a note that it requires some planning and should not be done ad-hock.

                You might also want to fire up perfmon on your bridgehead servers and check the load during the replication cycle.

                Argg, it's been a while since I put my hands on an AD with more than 10K accounts. Pity those can be counted on one hand in Israel (excluding IDF and such).
                Guy Teverovsky
                "Smith & Wesson - the original point and click interface"

                Comment


                • #9
                  Re: Site Replication Simulation

                  few thing.
                  1st of all Thanks for your reply.
                  almost everything you posted I knew already besides the fact that every partition has a different replication topoligy, I thought everything is based on the same replication train regardless to it's location/partition.
                  that is why I am thinking I should tweak abit my site links configuration.
                  but you did point somthing that might be the cause, Since all of my SCOM Alerts are pointing to the Domain DNS Partition and I know for a fact that there is alot of replication (and even CNF) for DNS PTR records,
                  I was thinking maybe I should reduce the amont of DC's that hold the DNS Server Service which any way are not being used by Clients and by that I could free the amount of DNS replication between DC's especially on sites that have more then one DC.
                  at this point most of our DC's (for history reasons and misunderstanding by previous AD Admins the consept of DNS/DC's, and Installing all the DC's with the DNS service on them).
                  up to not long ago it didn't make much difference rather the DNS was installed or not b/c the DNS was located in the Domain Data Partition (windows 2000 style), meaning every one would be replicated regardless if they have a DNS Service or not.
                  Up to not long ago I moved the DNS to the DomainDNSZone Partition that in theory would replicated to those DC's that have DNS Server Service installed alone, leaving those with out DNS outside of the replication for the DNS records (that was the point of adding those 2 partitions in Win2003).

                  1) if I reduce the amont of unnecessary DNS Services from DC's would that reduce the problem or increase it?

                  2) by uninstalling the DNS Server Service alone from a DC would that be enough telling the Directory that the DC is no longer to be replicated with the DomainDNSZones records?
                  (I don't mind reducing the DNS Services without looking at my Site links configuration , since they are all bridged anyway, hence there won't be a a situation that a DC that has a DNS might not get the replication of DNS records due to a DNS removal from his site linked DC partner).

                  regarding ADRAP, we had it and there wasn't much need to tweak the Site links and he said it's pretty fine , but again at the time SCOM wasn't really implemented.

                  regarding Site links cost , it's not that you havn't asked it just that I did not mention it since it is complicated.
                  but in overall it is configured as 3 HUB's with Spokes and every HUB is linked with each another. Since we have an MPLS cloud I am not sure that a Full mash configuration would be such a bad Idea , what do you think?

                  any way I Hope I did not miss anything.
                  Last edited by Akila; 19th May 2008, 20:15.

                  Comment


                  • #10
                    Re: Site Replication Simulation

                    What is your DFL/FFL ?
                    Any chance you are not at W2K3 FFL, hence LVR is not available ?
                    I have seen much larger orgs working with ADI DNS and not getting any CNFs. Any chance you have clients pointing to DNS servers from more that one AD site ?
                    (i.e.: client pointing to SITE_AD_DC1 and SITE_B_DC1 for DNS resolution)

                    Any chance you have DHCP service registering clients in DNS, while being configured to handle a very short lease to clients ?
                    Guy Teverovsky
                    "Smith & Wesson - the original point and click interface"

                    Comment


                    • #11
                      Re: Site Replication Simulation

                      Originally posted by guyt View Post
                      What is your DFL/FFL ?
                      Any chance you are not at W2K3 FFL, hence LVR is not available ?
                      I have seen much larger orgs working with ADI DNS and not getting any CNFs. Any chance you have clients pointing to DNS servers from more that one AD site ?
                      (i.e.: client pointing to SITE_AD_DC1 and SITE_B_DC1 for DNS resolution)

                      Any chance you have DHCP service registering clients in DNS, while being configured to handle a very short lease to clients ?
                      all our DC's are Win2003 SP1 or above x86/x64.

                      DFL/FFL - Windows Server 2003 functional level

                      I don't know if LVR is enabled or not since I have no Idea if in the past some one had in place upgraded a DC/s to win 2003 or it was a clean install (before my time).
                      The Windows Server 2003 Linked Value Replication (LVR) feature helps you recover accidentally deleted Active Directory data - no idea how this is related to my Case. Upgrading from the Microsoft Windows 2000 OS to the Microsoft Windows Server 2003 OS doesn’t automatically enable the Linked Value Replication (LVR) feature.

                      it might be possible that clients are pointing to more then one DNS at different sites but that something I would have to look into since client configuration is out of my scope, besides what difference does it make? once a client reaches the primary DNS it sticks with him, if that is not available it sticks with the Secondary DNS either way it should stick with one or the other not both.
                      Regarding DHCP - it is something I would have to look into since we don't use MS DHCP we Use VitalQIP DHCP IP Address Management Software, I would have to check with our comm team (who in charge of the QIP) what configuration they have set to clients. , I am going to have to get back to you on that one.
                      But in general I am less concerned about the PTR CNF's, I am trying to find a way reducing replication time between DC's (Inter-Site) on the DomainDNSZone Partition.

                      BTW - you never related to my questions on my previous replay about the DC's Holding the DNS service reduction.
                      P.S.S also the question about the Full Mash....MPLS etc.
                      Last edited by Akila; 19th May 2008, 20:17.

                      Comment


                      • #12
                        Re: Site Replication Simulation

                        Sorry for dragging you with the answers - was trying to gather more info before jumping the gun

                        Anyway...

                        LVR is indeed Link Value Replication, but it's connection with easier object restores is is not it's primary target.
                        LVR makes replication MUCH more efficient. In pre-LVR era each group modification would result in replication of whole "member" attribute of the altered group. LVR makes it possible to replicate delta changes only.
                        The "Link Value" also means that the feature can deal only with linked attributes (member/memberof, manager/direcrreports, etc..)

                        LVR is know to reduce replication traffic by tens and, in some cases, hundreds percents.
                        Not really relevant to DNS application partitions as dnsRecord attribute is not linked, but is very important for overall replication performance.
                        I will not delve into it much more, but will point you to Tomek's writeup on the topic: http://blogs.dirteam.com/blogs/tomek...his-about.aspx

                        Now for my reason of asking about client configuration. You have correctly mentioned:
                        "once a client reaches the primary DNS it sticks with him, if that is not available it sticks with the Secondary DNS either way it should stick with one or the other not both."

                        But there is something else you should consider: if a client is configured with primary DNS server that is outside of it's site and there is a DNS server locally the record will be created on a DNS/DC outside of it's site and will be replicated back into it's site, adding to the replication traffic.

                        As for QIP, I am almost sure it is capable of doing DDNS on behalf of the client and if you have both the client and QIP trying to register the same record, well... you get the point.

                        Now the site links... I think that what I would do is the following:
                        - for each hub site configure 3 site links:
                        BRANCH-X <-> HUB-1, BRANCH-1<->HUB-2, BRANCH-X<->HUB-3
                        - disable link bridging
                        - Full mesh for the 3 HUB sites :
                        HUB-1<->HUB-2, HUB-1<->HUB-3, HUB-2<->HUB-3
                        - enable change notification on the site link connecting the HUB sites

                        Personally I do not like site link bridging - it makes your topology much less deterministic and less controllable. In your case I think it would be wise to channel all the replication to the through the HUB sites. This will also reduce the load on KCC when generating the topology.

                        In addition I would look at the source where most of the changes are being originated:
                        - from my experience, domain partitions have most of the changes originating where IDM provisioning systems are located (MIIS/ILM/TIM/Quest/LDSU/whatever) and where the helpdesk is located. In most cases those are at core/hub sites, but your millage may very.
                        - Enterprise partitions are mostly static - the changes there are minor.
                        - GC partition in the case of single domain (and if all the DCs are GCs - perfectly fine in this case. Infrastructure Master can also be made a GC if all DCs are GCs)
                        - DNS application partitions changes are probably statistically distributed between the sites, and the amount of changes is directly related to the number of clients in the site. My bet is that you have a single DC with DNS on it at each site. I would keep the DNS there and make sure that:
                        a) the local DC/DNS is configured as primary.
                        b) one of the HUB DC/DNS servers is configured as secondary (you might even load balance the HUB sites)

                        In any case I would avoid full mesh - too easy to get lost in site links unless you have a scripted/automated solution, too much non-deterministic and too much unneeded load on KCC.

                        One last thing that is worth taking a look at is the zone configuration in the DNS/AD: how are the zones configured ?
                        "Replicate to all DNS servers in the domain" ?
                        "Replicate to all DNS server s in the forest " ?
                        "Replicate to all DCs"

                        How about reverse lookup zones ?
                        You might want to load balance the zones:
                        - configure forward zone (I am assuming single domain forest) to replicate to DNS servers in the forest (that will move the records to ForestDNSZones partition)
                        - keep reverse lookup zones with "Replicate to all DNS servers in the domain" (will keep the data in DomainDNSZones partition)

                        This way you will spread the updates load between 2 partitions and as each partition has it's own replication topology, this will speed up a replication cycle of partition in question as it will have less changes to replicate).
                        The total is still the same, but less work for the replication thread pulling the data on the bridgehead is a welcome addition.
                        Last edited by guyt; 19th May 2008, 23:11.
                        Guy Teverovsky
                        "Smith & Wesson - the original point and click interface"

                        Comment


                        • #13
                          Re: Site Replication Simulation

                          Originally posted by guyt View Post
                          But there is something else you should consider: if a client is configured with primary DNS server that is outside of it's site and there is a DNS server locally the record will be created on a DNS/DC outside of it's site and will be replicated back into it's site, adding to the replication traffic.
                          This is not the case, there is no way that a clients are configured to a DNS outside of site, for other reasons I won't get into now, but as a golden Rule all clients are connected to their DNS on site.

                          Originally posted by guyt View Post
                          As for QIP, I am almost sure it is capable of doing DDNS on behalf of the client and if you have both the client and QIP trying to register the same record, well... you get the point.
                          I will have to look into it as I said b4.

                          Originally posted by guyt View Post
                          Now the site links... I think that what I would do is the following:
                          - for each hub site configure 3 site links:
                          BRANCH-X <-> HUB-1, BRANCH-1<->HUB-2, BRANCH-X<->HUB-3
                          - disable link bridging
                          - Full mesh for the 3 HUB sites :
                          HUB-1<->HUB-2, HUB-1<->HUB-3, HUB-2<->HUB-3
                          it's pretty much configured that way (3 HUBS with Spokes, all 3 HUBS are all linked with each other.

                          Originally posted by guyt View Post
                          - enable change notification on the site link connecting the HUB sites
                          that will hardly make any difference since change notification is regarded to urgent replication only , e.g Account lockouts,
                          Password changes,Changes to the RID Master FSMO role holder DC.

                          Originally posted by guyt View Post
                          In addition I would look at the source where most of the changes are being originated:
                          - from my experience, domain partitions have most of the changes originating where IDM provisioning systems are located (MIIS/ILM/TIM/Quest/LDSU/whatever) and where the helpdesk is located. In most cases those are at core/hub sites, but your millage may very.
                          That is why I have it as 3 HUBS and spoke b/c of that reason, every HUB usually makes the changes to themselves or to their Spokes.

                          Originally posted by guyt View Post
                          - GC partition in the case of single domain (and if all the DCs are GCs - perfectly fine in this case. Infrastructure Master can also be made a GC if all DCs are GCs)
                          I am in one Domain/Forest Infrastrucer Master has no role here if it's on a GC (which it is BTW).
                          there are 2 exeptions when it could be on a GC:
                          1) on a one Domain
                          2) if all DC's are GC's.
                          if you have one of those 2 then the Infrastructer Master could be on a CG, you do not have to have them both.

                          Originally posted by guyt View Post
                          My bet is that you have a single DC with DNS on it at each site.
                          actually as I mentioned b4 , most of my DC's are DNS,hence I have more then one DC/DNS per site , that is why I asked if by removing/reducing the amount of DNS servers would make difference.

                          Originally posted by guyt View Post
                          I would keep the DNS there and make sure that:
                          a) the local DC/DNS is configured as primary.
                          b) one of the HUB DC/DNS servers is configured as secondary (you might even load balance the HUB sites)
                          as I mentioned b4, can not be done, a client has to have a DNS server that is located at his site.
                          don't warroy about secondary DNS even if there is one DC with DNS there is a fail over with other DNS services , it's complicated.


                          Originally posted by guyt View Post
                          One last thing that is worth taking a look at is the zone configuration in the DNS/AD: how are the zones configured ?
                          "Replicate to all DNS servers in the domain" ?
                          "Replicate to all DNS server s in the forest " ?
                          "Replicate to all DCs"
                          DomainDNSZones, meaning option "1"

                          Originally posted by guyt View Post
                          How about reverse lookup zones ?
                          You might want to load balance the zones:
                          - configure forward zone (I am assuming single domain forest) to replicate to DNS servers in the forest (that will move the records to ForestDNSZones partition)
                          - keep reverse lookup zones with "Replicate to all DNS servers in the domain" (will keep the data in DomainDNSZones partition) This way you will spread the updates load between 2 partitions and as each partition has it's own replication topology, this will speed up a replication cycle of partition in question as it will have less changes to replicate).
                          that is how it is configured now, my Forward DNS ZOne are at the ForestDNSZones partition (which I hardly have problems with replication time), my PTR Zones (reverse lookup zonnes) are located at the DomainDNSZones.
                          Last edited by Akila; 20th May 2008, 12:08.

                          Comment


                          • #14
                            Re: Site Replication Simulation

                            that will hardly make any difference since change notification is regarded to urgent replication only , e.g Account lockouts,
                            Password changes,Changes to the RID Master FSMO role holder DC.
                            Not quite correct. Enabling change notifications on site link will do 2 things:

                            - Changes that are subject to urgent replication will be pushed to the outbound replication queue on the bridgehead and replicated almost instantly (depending on the outbound replication queue length). Urgent replication ignores the holdback period.

                            - Regular changes will be queued after holdback period to the end of the outbound replication queue regardless the schedule configured on the site link.

                            If you enable change notifications on site link, remember that this will kick in only after the site link change is replicated to both sides of the site link.
                            Many organizations enable change notification because it allows urgent replication to cross sites, but this does not mean that only urgent replication is subject to change notifications.

                            As for DNS and DomainDNSZones partition, if you have only reverse lookup zones in DomainDNSZones partition, the CNFs you are seeing should not be disregarded - this is a sign of frequent simultaneous update of the same record on different DCs. This can be caused by things like:
                            - very short DHCP lease
                            - both the clients and QIP are registering PTR records

                            Or maybe QIP trying to register PTRs on several DCs simultaneously.

                            Think about it: there is a partition that holds only reverse lookup zones data. It is the only one causing CNFs and is replicating slowly. To me this says that there are a lot of changes going on in this partition and not all of the changes are done correctly.
                            Guy Teverovsky
                            "Smith & Wesson - the original point and click interface"

                            Comment


                            • #15
                              Re: Site Replication Simulation

                              Originally posted by guyt View Post

                              - Changes that are subject to urgent replication will be pushed to the outbound replication queue on the bridgehead and replicated almost instantly (depending on the outbound replication queue length). Urgent replication ignores the holdback period.
                              Urgent replication replicates instantly only in Intra-site partners, Inter-Site partners will receive the Update depending on the site link schedule.
                              that is where Site link notification comes in to the game.
                              BTW- Urgent replication is never pushed to a replication partner, DC's always but always work in Pull mode regardless to the severity of the replication needs.
                              The only exception that a Push mode is used is on a Password Change and even then the Push is only made to the PDC Emulator, not to other DC's.

                              after reviewing my book again (Active Directory Troubleshooting - available for Premier customers only, very good course BTW) I see that it is possible settings the value of site link notification disregarding Site link Schedule , I assume to this you meant.

                              here are the values available:
                              "1" - disregards Site link Schedule (as Intra-Site) , replicates with compression
                              "4" - Replicates by Site link Schedule but without compression
                              "5" - disregards Site link Schedule (as Intra-Site) , replicates with out compression

                              that will replicate all objects regardless if there are urgent or not, I think you meant that.
                              Last edited by Akila; 21st May 2008, 18:34.

                              Comment

                              Working...
                              X