Announcement

Collapse
No announcement yet.

Cluster problems

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster problems

    Dear fellow admins.

    I have issues with my vsphere cluster.

    First a breif explanation of my setup.

    4 Fujitsu Siemens RX servers - each with 3 nics.
    Ip net for ESX servers, and virtual servers: 192.168.10.x
    Ip Net for iSCSI 192.168.11.x and 192.168.12.x

    Each server have:
    2 Nic for iSCSI - ip net 11.x and 12.x
    1 nic for Everything else. 10.x net

    1 48 POrt 1000 mbit switch (zyxel level 3 switch) I am not using vlan at all. All traffic goes to this switch.

    1 IBM DS3300 iSCSI dual controller.
    Controller setup:
    COntroller A:
    POrt1: 192.168.11.6
    Port 2 192.168.12.6

    Controller B
    Port1 192.168.11.7
    Port2 192.168.12.7

    ISCSI is configured with 2 paths to each lun, active / active (io) and 2 inactive paths.

    Got no internal DNS
    Got 1 vmware infrastructure server witch controls the cluster. ip: 192.168.10.37
    Snippet from host file from ESX servers:

    192.168.10.20 ESX1.Hosting ESX1
    192.168.10.21 ESX2.Hosting ESX2
    192.168.10.22 ESX3.Hosting ESX3
    192.168.10.23 ESX4.Hosting ESX4
    192.168.10.37 VSS.Hosting VSS
    Allright, i have attached a picture of my network setup on one server (the same on all)

    Problem 1:
    Now my problems:

    Some of the ESX hosts sometimes gets disconnected from the cluster, and then they reconnect afterwards.
    from event log:
    *Host 192.168.10.21 in datacenter Datacenter is not responding.*Then the servs gets disconnected, and then:

    Alarm 'Virtual machine cpu usage' on SERVER changed
    from Green to Gray
    info
    12-05-2010 10:19:44

    Alarm 'Host connection failure' on 192.168.10.21 triggered
    an action
    info
    12-05-2010 10:19:44
    Alarm 'Host connection failure' on entity 192.168.10.21
    send SNMP trap
    info
    12-05-2010 10:19:44

    Then normally it reconnects itself again.


    Problem 2:
    SOmetimes the virtual serves looses connection with the vmware network.
    Today one virtual server even got shut down, and and startedd again. But before that happended, i noticed this from looking at "backup server" Tasks and events:

    192.168.10.21 is disconnected (.21 is a ESX server where Backupserver is located)
    And then:
    Host is connected
    info
    12-05-2010 06:40:27

    info
    12-05-2010 05:58:19
    This occoured 8 times within 1 hours. and then the backupserver shut down.

    Problem 3:
    one of the esx servers shows this in the event log:

    Alarm 'Cannot connect to storage' on entity 192.168.10.20
    send SNMP trap
    info
    12-05-2010 01:43:13
    Alarm 'Cannot connect to storage' on 192.168.10.20
    changed from Gray to Gray
    info
    12-05-2010 01:43:13
    Alarm 'Cannot connect to storage' on 192.168.10.20
    changed from Gray to Gray
    info
    12-05-2010 01:43:13

    Lost connectivity to storage device
    naa.600a0b80005aedbb00000ac54b17dfff. Path
    vmhba33:C7:T0:L3 is down. Affected datastores: "IBM
    LUN3".
    error
    12-05-2010 01:41:03

    Lost connectivity to storage device
    naa.600a0b80005aedbb00000f7e4b656403. Path vmhba33:
    C7:T0:L5 is down. Affected datastores: "IBM LUN5".
    error
    12-05-2010 01:41:03

    Lost access to volume
    4b2a36fb-986d63ce-b6ea-000ae48a8ba7 (IBM LUN2) due
    to connectivity issues. Recovery attempt is in progress and
    outcome will be reported shortly.
    info
    12-05-2010 01:41:04
    Successfully restored access to volume 4b2a36fb-986d63ce
    -b6ea-000ae48a8ba7 (IBM LUN2) following connectivity
    issues.
    info
    12-05-2010 01:42:03


    and so on


    Please help me guys, i've read a lot of docu on this before posting here, but frankly, i no longer know what to do. This is serius problems.

    Maybe installing a second nic in each server, and dedicating that to vm traffic, so i have service console traffic seperated would be a good idea?
    Attached Files

  • #2
    Re: Cluster problems

    looks like a potential switch problem.

    You're losing both a physical host, and paths to storage.
    The only common component there is the switch.
    (or the NICs)
    Please do show your appreciation to those who assist you by leaving Rep Point https://www.petri.com/forums/core/im.../icon_beer.gif

    Comment


    • #3
      Re: Cluster problems

      You could be right. Switch is 3 months old - but it could be the switch. Because iscsi traffic is going through other network cards than the rest of the traffic, but into the same switch.

      So.. could it be the internal vmware switch?? Or are we sure its the physical one ?

      Is it better to (for future purposes) to have 1 nic for SC, 1 nic for Vmotion, and other nics for iscsi?

      Comment


      • #4
        Re: Cluster problems

        I would suspect it's more likely the hardware switch than the vSwitch - simply because it's happening to both the iSCSI and general network traffic.

        i would be inclined to have:
        2NICS for ISCSI, one for each path.
        1 nic for vMotion
        1 nic for the vSphere Service Console (COS)
        1 nic for vSwitches to actually host the virtual guests

        (guests are generally recommended to be kept off the COS)
        Please do show your appreciation to those who assist you by leaving Rep Point https://www.petri.com/forums/core/im.../icon_beer.gif

        Comment


        • #5
          Re: Cluster problems

          Hi there,

          First I would check if your DNS settings correct.
          so all ESX hosts must be able to see Virtual center and each other by name, it should be in host file of each ESX host, this obviously if you don't have internal DNS.

          /etc/host
          and on Virtual Center c:\windows\...\...\..\host (You probably know where it is)
          Also what OS you using for Virtual Center?

          Secondly, but less important and it already stated by tehcamel that your VMOTION should be on defferent gigabit network in order to work properly.

          Third, your traffic must be segregated, either via VLANs or via different switches.

          Now questions

          1. what is the reason to have IP addresses from different subnets on on SAN and on ESX hosts. Why don't you use one subnet for iSCSI.

          2. Are you sure that your SAN is in Active active mode? can you show screenshot of Datastore which showing different path to the switch?
          Also I'm loosing point for Active Active SAN in terms of resiliency as you have only one switch for all your traffic.

          Evgeny

          Comment


          • #6
            Re: Cluster problems

            Originally posted by Zelinkiy View Post
            Hi there,

            First I would check if your DNS settings correct.
            so all ESX hosts must be able to see Virtual center and each other by name, it should be in host file of each ESX host, this obviously if you don't have internal DNS.

            /etc/host
            and on Virtual Center c:\windows\...\...\..\host (You probably know where it is)
            Also what OS you using for Virtual Center?

            Secondly, but less important and it already stated by tehcamel that your VMOTION should be on defferent gigabit network in order to work properly.

            Third, your traffic must be segregated, either via VLANs or via different switches.

            Now questions

            1. what is the reason to have IP addresses from different subnets on on SAN and on ESX hosts. Why don't you use one subnet for iSCSI.

            2. Are you sure that your SAN is in Active active mode? can you show screenshot of Datastore which showing different path to the switch?
            Also I'm loosing point for Active Active SAN in terms of resiliency as you have only one switch for all your traffic.

            Evgeny

            Yes i have checked hosts file, they are posted in my first entry - and afaik are they correct.

            Anyhow, i will deploy a dns server, and add 1 extra dual port nic in each esx server to seperate vmotion, service console and actual vm traffic.

            What do you mean that my traffic must be segregated via different switches - i thought i was good by creating different subnets, so brodcast traffic would not affect iscsi and the others? Or is that not enough?

            Answer to your questions:

            1. The reason why is is setup like that, is from the dell MD 3000i setup guide (same san as DS3300) quick link: http://blog.indiatechcenter.com/?cat=154 (they recommend one switch for each subnet, but i did'nt do that because i figured it didnt matter - maybe i was wrong) ?

            Maybe i need to configure everything different, witch is also okay - so im glad you're willing to help.

            2. For some reason, now i cant even see the paths, and refreshing etc. does not help. Very odd. But ive attached a screenshot of the setup. I cant even view the deviced on the iSCSI software adapter.
            Attached Files

            Comment


            • #7
              Re: Cluster problems

              ok now i can see the paths and devices (on one server) all the 3 others cant see it. The iscsi software adapter even lost it's name on the three other servers.. wtf?

              Anyway on this sc you can see the setup.

              Notice i have 2 sans (the first one is a server with windows unified datastorage server (lun 0 and 1) id: microsoft.

              btw. Ibm san never show much performance, so i think i really need to do some reconfiguration here.

              Do you also need to see screentshot from DS3300 IBM setup manager?
              Attached Files

              Comment


              • #8
                Re: Cluster problems

                Using different subnets won't separate broadcast domains. Use VLANs for that.
                Also do you have a default gateway configured and is he accessible all the time?
                Marcel
                Technical Consultant
                Netherlands
                http://www.phetios.com
                http://blog.nessus.nl

                MCITP(EA, SA), MCSA/E 2003:Security, CCNA, SNAF, DCUCI, CCSA/E/E+ (R60), VCP4/5, NCDA, NCIE - SAN, NCIE - BR, EMCPE
                "No matter how secure, there is always the human factor."

                "Enjoy life today, tomorrow may never come."
                "If you're going through hell, keep going. ~Winston Churchill"

                Comment


                • #9
                  Re: Cluster problems

                  There is a gateway on 10.1 but not on 11.x net and neither on 12.x net.

                  But the iscsi kernel default gateway is greyed out, but it says 192.168.10.1 ..

                  Comment

                  Working...
                  X