Picture this: It’s Monday morning, and lurking deep in your inbox is an e-mail suggesting that you have a meltdown happening in the heart of one of your business services. You immediately recognize this as one of those e-mails that has the potential to destroy the rest of your day. We all know these e-mails far too well, and today I have the pleasure of getting one of these great eggs to crack again.
Troubleshooting: Deadlock Issue
My issue begins with a nice image illustrating a deadlock issue with Microsoft Forefront Identity Manager, which in practice should not be occurring, as the agents should never run concurrently. A quick glimpse and I recall that these agents are all Orchestrated through the magic of my FIM Integration pack and the runbooks, which were painstakingly configured to run the agents in sequence, to ensure this issue would not occur.
So what is happening? A quick connection to my Microsoft System Center 2012 – Orchestrator server, and as I navigate to the main runbook that orchestrates this flow, and I can instantly see that I do have a problem.
In production, I expect that there should be only ever one instance of this runbook executing, and by design this runbook will loop forever. However, what appears to be happening is that each time the server is restarted the previous instance of the runbook reestablishes and a new instance is created, which quickly grows out of control because the runbook was not cleanly stopped prior to the restart.
Using Orchestrator Health Checker
At this stage of the process, it’s time to launch the Orchestrator Health Checker. This tool will report back all the active instances of each runbook, so I can quickly determine the scale of the problem.
Double-click on the problem runbook – in this case the one called Monitor: iDManagement – and a new dialogue is presented reporting all the instances that are active on each runbook server. It’ll also offer you the ability to Analyze Orpahans and have these killed off.
Cleaning Up: the Easy Method
As the health report returned from my health check is not what I would normally expect to see, we have another powerful function in the Orchestrator Health Checker that will set about stopping ALL running runbooks, cleaning out any orphans that might be lingering, resetting the logs, and then finally restarting all runbooks that are tagged Monitoring.
As my environment is in the perfect state to use this facility, I simply need to select the STOP icon from the ribbon, which will launch the wizard.
The duration of which the wizard will take to complete each step will of course be directly associated to the number of active instances and the amount of orphans found, as well as how complicated or deep your runbook tree might be. But in any case, this will be far faster then the alternative of doing the job by hand.
Once this is complete, the balance of ying and yang is returned, and Mondays can once more proceed as pain-free as possible.