Announcement

Collapse
No announcement yet.

System Logging

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • System Logging

    Hi folks.

    We are having some serious headaches with a couple of servers out at clients. The problem that we are having is that the servers appear to lock up or stop responding. Unfortunately, these sites are an hour away so the clients just kill the power to the machines and they come back up. There is no monitor on either server so we don't get to see what is on the screen.

    The only event that is logged on both servers is a message saying that the previous system shutdown was unexpected. The time on the message can be anything from 20 minutes to several hours before the server is power cycled. There aren't even any memory or kernel dump files created.

    What we are looking for is some software we can put on the servers to monitor what is going on and keep a log of absolutely everything so that we can have some information to go from. At the moment Both servers are 2003.



    In case its any help, here are the rest of the details but we've had three engineers try everything we can think of so far.

    Server 1 was originally an HP ML115 G5. We have had problems with other ML115 servers so we built a clean ML110 G5 and migrated active directory and all the PCs over to that. The problem has still recurred.

    On this server we have tried three different UPS to rule out power issues. Not a single power event has been logged in relation to this. The lockups have occurred during backups and also not during backups. They always tend to occur out of hours though. Backup software is Backup Exec 12.5 updated to latest service pack.

    So basically we have ruled out all the hardware, the power, done a clean build to rule out software and we're running out of things to check.

    Server 2 was originally an HP ML310 G4 working fine. There were some power issues and the bios ended up resetting itself. Unfortunately it was using the onboard RAID which defaults to being switched off. When we tried to re-enable it, the RAID controller eat the hard drives. We then had to rebuild the server and restore the data and did this using an Adaptec 1220 RAID card.

    Since then we have had the same mysterious lockups as with server 1. These have not been restricted to out of hours and have no consistency with backups (Backup Exec 11d here).

    Again, the server is protected by a UPS and we have tried several to ensure they are working fine. This server has even been moved to its own ring main.

    So we bit the bullet with this one too and replaced the server at our own cost. Now we have an ML110 G6 installed. The server was imaged using Backup Exec System Recovery but the problem has stayed.

  • #2
    Re: System Logging

    Do you have iLO installed? Via management interface you can check what the system is doing, even if it is turned off.

    In addition, check the HP System Management Homepage logs. You have installed the HP SMH tools, right?

    -vP

    Comment


    • #3
      Re: System Logging

      Originally posted by vonPryz View Post
      Do you have iLO installed? Via management interface you can check what the system is doing, even if it is turned off.

      In addition, check the HP System Management Homepage logs. You have installed the HP SMH tools, right?

      -vP
      The ML110 G5s don't have iLO.

      I've been corrected on the ML310. Apparently that ran stable except the RAID wouldn't stay in sync and get reporting that one of the drives had failed despite it being fine when tested.

      The ML110 G6 might have iLO but I didn't installed the G6 server so the iLO port is most likely not hooked up. Even on the G6 there isn't much in the way of system logging even via the management homepage.

      So it really needs some kind of third part app. I looked into Nagios but the win32 build says beta and carries a big warning about it locking up servers. Don't want to go making the problem worse!

      Comment

      Working...
      X