Announcement

Collapse
No announcement yet.

Off hours Network Alerting/Monitoring

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Off hours Network Alerting/Monitoring

    I am just wondering how other companies handle off-hour alerting of their networks that do not have a third shift. We have a staff of four network engineers and I am in the process of putting a plan together. here is my rough draft.

    I was not sure where to put this question but this seemed like a good place.

    I. Groups for Solarwinds alert setup
    a. Hospital circuits
    b. Radiologists
    c. Home Transcriptionists
    d. Internet
    e. Clinics(open and closed) open 24/7
    II. Team Responsibilities
    a. Receive pages from Solarwinds
    1. Switches
    2. Routers
    3. Internet down
    4. LAN-LAN tunnel down
    5. etc
    b. Look into issue asap
    c. Determine how soon it needs to be resolved
    1. Hospital sites asap URGENT
    2. Clinics before they open if possible
    3. Radiolgists asap
    4. Home Transcriptionists (?)

    d. Notify rest of team who is working on issue
    e. Notify 1st level on-call person of issue
    f. Determine cause of issue
    g. Contact Circuit provider if circuit down
    h. Stay on it until resolved
    III. Teams
    1. Suggestions two on rotation, all hands on deck, other???

  • #2
    Re: Off hours Network Alerting/Monitoring

    Not sure what to make of your list but here's what I do:

    I'm the only network/system engineer on staff so uptime falls 100% on my shoulders.

    There are two locations that I'm responsible for.

    I use Tembria Server Monitor in the main location to monitor all servers for uptime, disk space, services, etc. If there's a failure Tembria sends me an email alert that gets forwarded to a cell phone and a pager. Tembria also monitors all network and email systems, if there's a failure Tembria sends an alphanumeric page to a pager via POTS line (since an email alert won't work if the network or email systems are down).

    I use Tembria in the secondary location to "monitor the monitor". If there's a failure Tembria sends me an email alert that gets forwarded to a cell phone and a pager.

    I use Dell IT Assistant to monitor all server hardware (We're a Dell shop) and send an email alert if there's a failure that gets forwarded to a cell phone and a pager.

    I use external services (MXToolbox and DNSStuff) to monitor my public DNS and MX, if there's an error I get an email that's forwarded to a cell phone and a pager.

    I use PRTG and ManageEngine Netflow Analyzer to monitor network traffic volumes and if there's an error or over threshhold condition, I get an email that's forwarded to a cell phone and a pager.

    I use SolarWinds, Colasoft Capsa, and WireShark for spot troubleshooting of network problems.

    Comment


    • #3
      Re: Off hours Network Alerting/Monitoring


      Joe, do you use anything for server (windows) event monitoring?
      Tom Jones
      MCT, MCSE (2000:Security & 2003), MCSA:Security & Messaging, MCDBA, MCDST, MCITP(EA, EMA, SA, EDA, ES, CS), MCTS, MCP, Sec+
      PhD, MSc, FIAP, MIITT
      IT Trainer / Consultant
      Ossian Ltd
      Scotland

      ** Remember to give credit where credit is due and leave reputation points where appropriate **

      Comment


      • #4
        Re: Off hours Network Alerting/Monitoring

        Yes, Tembria monitors the event logs as well. It monitors everything from the OS and up, I just didn't want to mention everything as the list is long. Dell IT Assistant monitors the hardware and Tembria monitors everything else.

        Comment

        Working...
        X