Announcement

Collapse
No announcement yet.

Dual-site Bridgehead Recovery, ISTG, service restart ?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dual-site Bridgehead Recovery, ISTG, service restart ?

    My Question: How do I configure the bridgeheads to not need manual intervention? I believe by design they are supposed to keep trying to communicate with each other until they come back up after a remote site power outage.

    In my case I could see the Event Log Error messages 1311, 1566, 1865 continually occurring every 15 minutes, even though both bridgehead servers were up and could ping each other. The only way I routinely solve this is by manually restarting the “Intersite Messaging Service” on one or both of the servers – a reboot of the servers doesn’t resolve the replication.
    __________________________________________________ ___________
    Here some more info about my domain if it helps come up with an answer:

    I have one W2K3 domain across two sites. Short on $ and equipment so the remote office site only has one DC, we will call this DC03. DC03 is configured to be both the GC (Global Catalog Server) as well as the Bridgehead server for the remote site. Its replication partner is DC01 the bridgehead server at our primary site.

    The good new is that when I run a dcdiag report { dcdiag /s: DC03 /e} it gives me a perfect domain topology with no issues. I ran the dcdiag command (respectively) against the other DC’s and all looks perfect. Same with the RepAdmin command – all looks good.

    Until….

    The remote site is prone to power outages and the occasional drop. So naturally my AD shows event logs regarding replication problems. Once power and LAN connectivity is restored I reboot DC03 but continue to get Error messages 1311, 1566, 1865 every 15 minutes? I can ping my primary site bridgehead from the remote bridgehead and vice versa. You can see their syntax of the 1311, 1566, and 1865 errors here on this other guys post http://www.itnewsgroups.net/group/mi...topic8554.aspx I searched MS KB, Google, MS Live Search, and I get a lot of hits for Windows 2000 having that issue but not 2003.

    Since DC03 (the secondary site bridgehead ) was rebooted after LAN and power were back, I thought I would reboot the primary bridgehead to see if that solved the problem, it didn’t. Then I read a post about manually restarting the “Intersite Messaging Service” That worked! I think I only restarted the service on the primary site bridgehead, but don’t recall if I had also restarted it on the remote site bridgehead at about the same time.

    I’ve read: http://technet2.microsoft.com/window....mspx?mfr=true but didn’t see an answer to my question there.

    I let AD auto config as much of the topology as possible from the get-go so I don’t think I have the site topology misconfigured – at least I feel it is unlikely. Thanks for your help.

  • #2
    Re: Dual-site Bridgehead Recovery, ISTG, service restart ?

    did you installed sp2 and latest updates ? what about replmon result ? FRS logs ?

    please can you paste the event ids description from your eventvwr ?

    Comment


    • #3
      Re: Dual-site Bridgehead Recovery, ISTG, service restart ?

      Thanks for taking an interest in my post. DC's are at SP1, so I will upgrade to SP 2 this weekend. (maybe that's all I need?)

      Next time power goes out, which I'm sure will be in about a week at the rate the Gulf Coast is getting rain in Texas, I will run the replmon commands and post them. As for right now, the errors stopped once I manually restarted the “Intersite Messaging Service".
      Here are the event errors that were occuring every 15 minutes:
      ---------------------------------------------

      Type: Warning
      Source: NTDS KCC
      Event ID: 1566
      Event Time: 7/25/2005 9:55:55 PM
      User: NT AUTHORITY\ANONYMOUS LOGON
      Computer: LSDC
      Description:
      All domain controllers in the following site that can replicate the
      directory partition over this transport are currently unavailable.
      Site:
      CN=Honolulu,CN=Sites,CN=Configuration,DC=mydomain, DC=com
      Directory partition:
      DC=mydomain,DC=com
      Transport:
      CN=IP,CN=Inter-Site Transports,CN=Sites,CN=Configuration,DC=mydomain,D C=com

      ---------------------------------------------

      Type: Error
      Source: NTDS KCC
      Event ID: 1311
      Event Time: 7/25/2005 9:55:55 PM
      User: NT AUTHORITY\ANONYMOUS LOGON
      Computer: LSDC
      Description:
      The Knowledge Consistency Checker (KCC) has detected problems with the
      following directory partition.
      Directory partition:
      DC=mydomain,DC=com
      There is insufficient site connectivity information in Active Directory
      Sites and Services for the KCC to create a spanning tree replication
      topology. Or, one or more domain controllers with this directory partition
      are unable to replicate the directory partition information. This is probably
      due to inaccessible domain controllers.
      User Action
      Use Active Directory Sites and Services to perform one of the following
      actions:
      - Publish sufficient site connectivity information so that the KCC can
      determine a route by which this directory partition can reach this site. This
      is the preferred option.
      - Add a Connection object to a domain controller that contains the directory
      partition in this site from a domain controller that contains the same
      directory partition in another site.
      If neither of the Active Directory Sites and Services tasks correct this
      condition, see previous events logged by the KCC that identify the
      inaccessible domain controllers.

      ---------------------------------------------

      Type: Warning
      Source: NTDS KCC
      Event ID: 1865
      Event Time: 7/25/2005 9:55:55 PM
      User: NT AUTHORITY\ANONYMOUS LOGON
      Computer: LSDC
      Description:
      The Knowledge Consistency Checker (KCC) was unable to form a complete
      spanning tree network topology. As a result, the following list of sites
      cannot be reached from the local site.
      Sites: <Site info>

      Comment

      Working...
      X