    Hi all

    I'm really struggling to sort this one out, and any help would be appreciated. I've trawled Google extensively and can't find anything similar to this, although I'm still looking.

    I have an SBS 2003 server with approx 40 users attached. We are separated from the internet by a Watchguard Xedge 50 firewall, and we use Messagelabs for our email filtering. The site has a 6mbps SDSL connection.

    It has come to my attention that a certain number of messages are not getting through to my users. As we have a lot of communication with the far east, namely China, Japan and Hong Kong, my first thought was that it was being filtered as spam. I logged into Messagelabs to track and trace the emails, and found that the msgs in question were stuck in a queue, their delivery result status described as 'retrying'. As Messagelabs will retry for 7 days before sending a non delivery report, I've not had any mail bounced back to me yet (3 days to go!)

    I've tested by sending email from my home account, and about 1 in five emails ends up in this state, stuck in a queue and unable to move. This is what Ive observed so far:
    • Restart the firewall. No change.
    • Restart the server. Again no change.
    • I checked the incoming firewall log, and found multiple smtp connections established from the Messagelabs towers for each email sent.
    • On checking the message logs for the Exchange server, I find multiple instances of each email. Is this normal? Messagelabs support team informed me that their towers are establishing a connection which then dies out. It reports back the existence of a duplicate email being detected by my server.
    • Telnet to the server on port 25 via the LAN, and sent multiple emails, creating a fresh telnet connection each time. Complete success. Every msg delivered. Ive tried to telnet from outside, but the connection is not accepted, even though Ive added my IP address to both the SMTP rule on the firewall and on the server SMTP connector.
    • Have found multiple queues showing in the c:\program files\exchsrvr\mailroot\vsi 1\queue folder. On checking the default virtual SMTP server, I see some 26 sessions from messagelabs, some as held for as long as 800 seconds. If I terminate the sessions, they are re-established swiftly and stay there again. Messagelabs inform me that the session should only be established briefly.
    • I have maximum SMTP logging enabled, but cant find anything in the Event Viewer that could relate to my problem.

    I really am going a bit nuts trying to figure this out. Can anyone give me some pointers about where the problem may lie? If I was to take a wild guess, Id say it could be my Firewall, and Im going to speak to Watchguard in the morning, but does anyone have any other suggestions?

    Many thanks for taking the time to read this.

    Does the Watchguard run the latest software?
    Do you have any AV/AS installed on the Exchange server?

      Yep, the firewall is running the latest available software, and I've tested the problem with AV running and disabled. It's looking more like a problem at Messagelabs end, as I've noticed the stuck connections are on the same subnet, and their support team suddenly started taking the problem seriously today after trying to fob me off for most of last week.


        Ah ok, let us know if you get a resolution!

          Still no resolution to this, sadly. Messagelabs have given me the following list of possible issues.

          There could be a number of reasons for the issue which are located below:
          1. Your mailserver is overloaded.
          a.) All of our servers are ganging up on an old mailserver and it can't cope. You may need to set a limit on the amount of concurrent connections it will accept. Try and find a happy medium by looking at the bandwidth available, the capability of the mailserver and the number of connections to accept. (50 is usually a good number)
          b.) The mailserver is not quick enough in its reply so we talk at it until it does reply. For a number of reasons a mailserver may not send us the 250 OK message quick enough, it might be set to do reverse DNS lookups before accepting or performing security checks before accepting mails for example. In these instances we will not wait for it, we just send the mail again.
          c.) Email size limit. Most customers have unlimited size limits and we may have to give them a mail that is simply rather large. Two possible things could occur here, either the mailserver will spend so long trying to process the mail we will think it has not been delivered and we send it again or the mailserver may have other mails to process at that time and just won't cope with this extra mail so it won't finish of the transaction will us and so we will send it again.
          d.) Dos/SPAM attack. If for some reason you are the target of a DOS/spam attack then all of your bandwidth may be taken up when we try and deliver mail. Again, the mailserver will be trying to sort out all the connections and may not finish the conversation with us so we send it again.

          2. Pix Firewall
          Customers with Pix firewalls with old software versions sometimes get duplicate emails. There is a bug in the Cisco software that resets the connection if the . and at the end of a mail are in two different TCP packets. Because it won't have a proper conversation with us we resend the mail.
          Users with this problem can disable the "SMTP fixup protocol" in the Cisco PIX configuration so that our mail servers can send email directly to your mail servers instead of PIX acting as an intermediate relay. However, the best way to solve the problem is to upgrade the PIX software which fixes the bug. For reference, go to the web site and search for the keyword CSCds90792, which is the Cisco bug number.
          This bug was patched in the PIX software versions 5.2.4 and 5.2.5, but apparently still exists in older and newer versions, including 5.3.1. Cisco should be able to provide a patch for this problem.
          For newer versions of Pix, we recommend disabling "ESMTP Inspect"

          3. Vague Email client issues:
          a.) Mutt
          If you uses Mutt with Quail - check that you have not taken the send mail invocation line from the quail FAQ and put it in the mutter. If you have, then remove that line from mutter.
          b.) Evolution
          Evolution downloads all messages in a batch from the POP3 mailbox and subsequently deletes them from the server, instead of deleting each individual message right after downloading it into the local folder. This can cause possible duplicates.

          4. POP Issues
          A POP server will only set the email status to "downloaded" after mails are downloaded. So if the email retrieving process is interrupted for any reason for example the connection has died before completion of the download, the email client will restart downloading emails from the very beginning, including those emails that have already been successfully retrieved.

          5. Set-up Issues
          The server has not been set to delete mails from the server after downloading.

          6. The mail is sent multiple times by the sender.
          Please check the message ID's of 2 different e-mails to see if they differ.

          I've gone through this list and everything seems to check out ok. Can't even think of what to say now.


            Turns out it was a problem with our SDSL connection, somehow blocking 250 OK replies to just one of the Messagelabs towers. Bizarre.

            Sohei out.