Sembee | February 2007

Routing Groups and SMTP Virtual Server Issues

When you are carrying out many installations and migrations, it is too easy to presume that a site is setup in the same way as the others, particularly when a specific technique works every time. When you get a site where it doesn't work in the same way, it throws you off a little bit.

I have recently carried out an installation for a client where a number of a factors caused some problems that caught me out.

Background

Client has two sites, about 150 miles apart, connected by a dedicated 2mb line.
500 users, roughly 60/40 split.
Three domains, parent and two child domains, with a child for each site. The root domain does not have any resources in it.
Single Exchange 2003 server, with all users on the same server.
The server wheezed badly, it wasn't configured correctly to begin with

Requirements

The requirements that the client required were very simple

improve performance of email for all users
improve performance of email for the users in the "other" site
provide enough capacity for growth of the company
increase the mailbox limits
carry out tasks to comply with regulatory requirements (the client is in the financial services industry)
simplify management of the Exchange servers.

Nothing unusual there.

Deployed Solution

The solution I proposed and the client went for was to move to four servers - a single back end and front-end server in each site.
Both sites would be able to receive email from the internet, so if one site was down the other site received their own email and queued the email for the other.

The initial deployment was for all four servers on the same LAN as the original server. This made the data migration smooth and almost transparent to the user community.

The Problem

When separating Exchange servers with a limited bandwidth connection, using routing groups gives you control over how the email is routed out to the internet. As both sites had high speed Internet connections, we wanted the traffic to go straight out, rather than over the WAN link connection to the other server and then out to the internet.
However on this site, whenever the servers were split in to two routing groups, email from the second routing group to servers in the first routing simply queued, although email to the internet was fine. Moving the servers back in to the single routing group allowed email to flow correctly.

The routing groups were failing with a DNS related error message, which seemed odd as the servers could all talk to each other using IP address, NETBIOS name and the fully qualified domain name.

The problem was resolved very quickly when it was worked out what was wrong.

What went wrong?

It was agreed with the client that the best practises should be followed with the server naming conventions on the internet.
Furthermore the client wanted to limit the amount of internal information in the SMTP headers.
I therefore adjusted the properties of FQDN on the SMTP Virtual Server to reflect the server's real name on the internet, and the client arranged for reverse DNS to be configured for the relevant IP addresses.

However, one of my other techniques was not implemented by the client at the time the servers were initially deployed, due to the issues with their internal network.
When I deploy Exchange, I always configure a split DNS system. This allows the external name of the server to resolve internally as well, and resolve to the internal IP address of the server. (More info on split DNS: http://www.amset.info/netadmin/split-dns.asp).

What I had forgotten was that the routing group information uses the FQDN on the SMTP server in the configuration of the routing group connector.
Therefore the servers were finding the FQDN of the other server, but it was being resolved to an external IP address, instead of the internal IP. The firewall was blocking the traffic (as most firewalls do).

The Solution

There are actually two solutions to this problem, both were deployed to ensure that it doesn't cause a problem again.

Setup the split DNS system. This allowed the names to be resolved correctly and for email to flow.
Change the SMTP virtual server configuration.
If you change the IP address setting on the SMTP virtual server from "All Unassigned" to the specific IP address of the server then that also fixes the problem. The server doing the sending then doesn't have to do a name resolution for the other end of the routing group as the IP address information is enough.

What did I learn?

If you don't learn from incidents like this, then you don't gain anything, and I take something from every deployment that I do.
Don't presume that the clients network will work like most other networks. If the network has had a history of things not working correctly then this is particularly the case.

Setup everything that you need before you start.
I have also adjusted by own procedures and will now change the IP address settings from "All Unassigned" to a specific IP address, unless there is a reason not to for that specific client. This setting change shouldn't cause a problem with the vast majority of deployments and will avoid issues like this, particularly if there is a possibility that routing groups may be used in the future.

Downloadable Guides to Deploying Exchange 2007 Now Available

Four guides for deploying Exchange 2007 are now available for download from the Microsoft Download Center.

They are all Word documents, so easily transportable.

Deploying a Standard Exchange Server 2007 Organization: http://go.microsoft.com/fwlink/?LinkId=82170
Deploying a Simple Exchange Server 2007 Organization: http://go.microsoft.com/fwlink/?LinkId=82171
Deploying a Large Exchange Server 2007 Organization: http://go.microsoft.com/fwlink/?LinkId=82172
Deploying a Complex Exchange Server 2007 Organization: http://go.microsoft.com/fwlink/?LinkId=82173
However I don't think you will be printing them out unless you really hate trees or own stock in your printer supplies company. The "Simple" guide alone is over 470 pages, the complex guide is almost 700!

The Problem with Backup MX Services and an Alternative

As email becomes more critical to a company the issue of what happens with email if the server or internet connection fails is often raised.
One solution that is frequently mentioned is backup MX services. This is where a server located elsewhere is listed in your MX records with a higher MX value. The theory being that in the event of your server being available, the backup server will collect your email and then pass it on once the server has returned.
However a backup MX service will cause you grief outside of the times when it becomes useful, which basically makes the disadvantages outweigh the advantages considerably.
If you are having significant outages where a backup service becomes important to your business then you need to review the overall service of the primary server. A backup MX service should be something that is never used - or used once or twice a year at most.

What is the problem with Backup MX Services?

The number one problem with backup MX services is the fact that most spam goes to the higher MX records. Spammers have worked out that most higher MX records will not have the same level of spam protection and there is a higher chance of their message getting delivered.
As the most effective anti spam methods block the spam at the point of delivery, if you have the messages going through another server then you cannot use any of those methods. The only methods of dealing with spam is the traditional detection methods - which struggle with image based spam.
Furthermore dropping email delivered to non-existent users is almost impossible. Depending on the backup MX service that you use, they may insist on your server accepting all email, then dealing with it once it has been delivered. As most spam is spoofed, that means finding some way of dropping the messages once it has been delivered.

Do you need a backup solution at all?

SMTP email delivery has some robustness built in. Most email servers will attempt to deliver email for 48 hours, so as long as you can get something else in place in that time window you are fine. I work on the theory that if you cannot get an alternative solution in place in 48 hours then you have bigger problems to worry about.

Alternative Server Solutions

The way that I prefer to work is to put something in place in the event of something happening. I can have an alternative email service in place in less than 30 minutes. Obviously that is only setup if the outage is confirmed as being many hours or even days. If it is simply a power failure then I don't tend to bother.
However the biggest hurdle with dealing with alternative server solutions that are put in place after the event is DNS replication.
To get round the DNS issue, I use a dynamic DNS service.
I have my own domain registered with one of the dynamic DNS service providers. Each client has a host in my domain with that service and that host is pointing to their existing mail server's IP address. You could use one of the dynamic DNS provider's domains if you don't have a spare domain to use.
The host appears in their MX records.
Thus:
mail.domain.com cost: 10
client.mydomain.net cost: 20
mail.domain.com type: A IP Address: 123.123.123.123
client.mydomain.net type: A IP Address: 123.123.123.123
Note that both IP addresses are the same at this time.
In the event of a failure, I simply switch the dynamic DNS IP address to the address of the alternative server. That IP address change is live in less than 20 minutes, usually within two or three minutes, as the dynamic DNS services are configured for frequent changes to the IP addresses and make them accessible across the internet very quickly.
As the host is already in the DNS for the client's domain, I don't have to wait for any DNS changes to propagate.
When the primary server is available, the dynamic DNS host is switched back.

Alternative Server Software

For an alternative server, pretty much anything that supports SMTP can be used.
IIS makes a very good queueing only server. Someone with some asp skills could easily knock up a front-end that allows the messages in the queue to be viewed with a standard web browser. Access would need to be very restricted as it would show all messages in the queues.
You could even sign up with a host and create accounts in their email package. As long as the host has a decent web based front end for the email the end users can work with their email. When service is resumed, export the email to PST files and import it in to the mailbox.
If you are going to use a host, try and find one that supports IMAP connections. Then restrict the users to web based access and configure the server to store a copy of the sent items. That will ensure that you can download and import sent and received items in to Exchange when the server is available.

The Key is Forward Planning

As with many system failure procedures, the key is forward planning. By getting the DNS changes made in advance, you can reduce the downtime for email considerably, resulting in less downtime for your end users. While it may not be a perfect solution, it will give you access to the email allowing phone calls to be made and business to operate - if not at optimum levels.

SEMblog