Announcement

Collapse
No announcement yet.

Disk Queue Length ?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Disk Queue Length ?

    Our organization is having some slowness problems particularly when most are logging on and off so mornings and 330 or so I've been through everything bandwidth etc we have 10G switches but I've come across this I believe is the problem on our server that we redirect everyones desktop and profile etc. On that drive in he resource monitor there is a section for Disk Queue Length that I've read should be 0-2. Ours averages 5-10 and spikes to 50 during these slowness times. All our servers are VMware, its on a SAN with SSD drives so what can I do to resolve this. Its just on the drive that that data is on so we've been considering creating another drive and splitting up the users profile folders or do we need another separate server? How can I fix this problem

  • #2
    Re: Disk Queue Length ?

    While spiking to 50 isn't great, it's not bad either.

    SAN arrays are more efficient and perform better (as far as IOPS and throughput) when the queue is 5 - 30, depending on the array. (at least that's the story from my Compellent training)

    What is the CPU and network utilization?
    Have you checked to see if network latency is an issue? Are any of your edge switches being saturated?
    Regards,
    Jeremy

    Network Consultant/Engineer
    Baltimore - Washington area and beyond
    www.gma-cpa.com

    Comment


    • #3
      Re: Disk Queue Length ?

      The CPU and network aren't pegged out, we recently moved the SAN from 1G to the 10G switches. The only thing I can find thats pegged out is this Disk Queue length. The server houses the users home drives and folder redirections. I'm wondering if its just too many users for one server or drive. DO I need to split it up

      Comment


      • #4
        Re: Disk Queue Length ?

        I'd hardly call 50 queue depth pegged out. If the issue started when you went from 1G to 10G then you're probably having issues at the switches.

        Is this a hybrid 1G/10G setup? If so you may have flow control issues on the switches or NICs. Basically the 10G ports/NIC can be overwhelming the 1G ports if flow control isn't working properly.
        Regards,
        Jeremy

        Network Consultant/Engineer
        Baltimore - Washington area and beyond
        www.gma-cpa.com

        Comment


        • #5
          Re: Disk Queue Length ?

          Jeremy, I think you could be right as the hosts for the VM servers are still 1G and the SAN is on the 10G linked to the servers via a 1 G switch. I'm not a networking expert but the network team here during these slowness periods tell me that the statistics on the switches don't show them maxed out and I don't know if they are right or just not seeing where the problem is. Since we didn't see the problem when the SAN was on the 1G I was wondering if its now pushing more traffic then the server can handle and am considering moving it back to the 1G if just to test for a few days.

          Is there something specific I could ask them to check for flow control?

          Also I don't know what the Queue depth numbers were before the move to 10G for all I know its normal its just the only thing i see now thats from what I've read is beyond normal.

          Comment


          • #6
            Re: Disk Queue Length ?

            Queue depth is recommended to be 0-2 per spindle, rather than just 0-2.. Quite how that plays out in a SAN/iscsi world, I'd confess ignorance.

            One thing that can cause high disk queues in insufficient physical ram.. it starts paging to disk.. monitor your RAM and see if there's anything there

            and just to be sure.. if the comms team don't see high thruput/util on the san when you're seeing high disk queue, what can you see on the SAN itself?

            maybe you can now throw more actual data at thr san, but the iops aren't keeping up.. ?
            Please do show your appreciation to those who assist you by leaving Rep Point https://www.petri.com/forums/core/im.../icon_beer.gif

            Comment


            • #7
              Re: Disk Queue Length ?

              Originally posted by jason0923 View Post
              Is there something specific I could ask them to check for flow control?
              Utilization on the switch could show something. But you need to make sure that flow control is enabled on the switch ports and the iSCSI NIC/HBAs. Also, if the switches are running out of date firmware, the flow control cloud be malfunctioning.

              If you move back to 1G and things start performing well again then you should look at updating firmware, checking configurations, and perhaps the need to replace switches that can't handle a hybrid 1G/10G setup.
              Regards,
              Jeremy

              Network Consultant/Engineer
              Baltimore - Washington area and beyond
              www.gma-cpa.com

              Comment


              • #8
                Re: Disk Queue Length ?

                We did move back to the 1G connection and it seems to have reduced the problem. The Queue depth is averaging 5-10 but users aren't noticing the slowness. We might have to wait till we can upgrade the servers to 10G before we put the SAN back on the 10G

                Comment


                • #9
                  Re: Disk Queue Length ?

                  A few questions/comments that hopefully help.

                  Does your SAN attached storage array have the ability to auto-tier between SSD and normal spinning disks? Is this feature turned on? Is it turned on for the volume that is experiencing the slow down?

                  A much more important metric than disk queue length is latency for disk operations (ms per read, ms per write). Just like in a bank I really don't care if the line is long as long as it is moving quickly. The latency metric is the sum of wait time and service time for the operation.

                  Different applications have different tolerance levels for latency. A good rule of thumb is you want most of the latencies to be 10ms or less. Bursting to higher levels is OK, but on average is the # low? For instance Exchange starts getting cranking 20 ms or higher, and REALLY starts to behave weirdly when latency is over 30 ms.

                  Is your data transport using LARGE FRAMES (packet, MTU size, whatever the proper term is)? Large frame support has to be supported end to end for it to work. Conceptually "trace" the path. App -> OS -> IP stack -> hba/nic driver -> ethernet switch(es) -> ethernet router(s) -> up the stack on the other side. The max effective frame size will be the size of the smallest of the above maximums. This will greatly impact the # of frames/packets that a single IO will create. This will increase throughput and put less of a strain on the underlying architecture.

                  What are the network latencies between the server and the storage? From the server command prompt run a sequence of pings to the storage array:

                  ping -n 1000 -l 1500 IpAddressStorage

                  At the very end the display will look similar to:

                  Ping statistics for 1.2.3.4:
                  Packets: Sent = 1000, Received = 999, Lost = 1 (0.1% loss),
                  Approximate round trip times in milli-seconds:
                  Minimum = 1ms, Maximum = 4ms, Average = 1.02ms

                  If latency is high or there are many that are lost there is an underlying fault in your network. Drill deeper to determine cause of latency / loss of packets.

                  Good luck.

                  Comment

                  Working...
                  X