When you are carrying out many installations and migrations, it is too easy to presume that a site is setup in the same way as the others, particularly when a specific technique works every time. When you get a site where it doesn't work in the same way, it throws you off a little bit.
I have recently carried out an installation for a client where a number of a factors caused some problems that caught me out.
Background
Client has two sites, about 150 miles apart, connected by a dedicated 2mb line.
500 users, roughly 60/40 split.
Three domains, parent and two child domains, with a child for each site. The root domain does not have any resources in it.
Single Exchange 2003 server, with all users on the same server.
The server wheezed badly, it wasn't configured correctly to begin with
Requirements
The requirements that the client required were very simple
- improve performance of email for all users
- improve performance of email for the users in the "other" site
- provide enough capacity for growth of the company
- increase the mailbox limits
- carry out tasks to comply with regulatory requirements (the client is in the financial services industry)
- simplify management of the Exchange servers.
Nothing unusual there.
Deployed Solution
The solution I proposed and the client went for was to move to four servers - a single back end and front-end server in each site.
Both sites would be able to receive email from the internet, so if one site was down the other site received their own email and queued the email for the other.
The initial deployment was for all four servers on the same LAN as the original server. This made the data migration smooth and almost transparent to the user community.
The Problem
When separating Exchange servers with a limited bandwidth connection, using routing groups gives you control over how the email is routed out to the internet. As both sites had high speed Internet connections, we wanted the traffic to go straight out, rather than over the WAN link connection to the other server and then out to the internet.
However on this site, whenever the servers were split in to two routing groups, email from the second routing group to servers in the first routing simply queued, although email to the internet was fine. Moving the servers back in to the single routing group allowed email to flow correctly.
The routing groups were failing with a DNS related error message, which seemed odd as the servers could all talk to each other using IP address, NETBIOS name and the fully qualified domain name.
The problem was resolved very quickly when it was worked out what was wrong.
What went wrong?
It was agreed with the client that the best practises should be followed with the server naming conventions on the internet.
Furthermore the client wanted to limit the amount of internal information in the SMTP headers.
I therefore adjusted the properties of FQDN on the SMTP Virtual Server to reflect the server's real name on the internet, and the client arranged for reverse DNS to be configured for the relevant IP addresses.
However, one of my other techniques was not implemented by the client at the time the servers were initially deployed, due to the issues with their internal network.
When I deploy Exchange, I always configure a split DNS system. This allows the external name of the server to resolve internally as well, and resolve to the internal IP address of the server. (More info on split DNS: http://www.amset.info/netadmin/split-dns.asp).
What I had forgotten was that the routing group information uses the FQDN on the SMTP server in the configuration of the routing group connector.
Therefore the servers were finding the FQDN of the other server, but it was being resolved to an external IP address, instead of the internal IP. The firewall was blocking the traffic (as most firewalls do).
The Solution
There are actually two solutions to this problem, both were deployed to ensure that it doesn't cause a problem again.
- Setup the split DNS system. This allowed the names to be resolved correctly and for email to flow.
- Change the SMTP virtual server configuration.
If you change the IP address setting on the SMTP virtual server from "All Unassigned" to the specific IP address of the server then that also fixes the problem. The server doing the sending then doesn't have to do a name resolution for the other end of the routing group as the IP address information is enough.
What did I learn?
If you don't learn from incidents like this, then you don't gain anything, and I take something from every deployment that I do.
Don't presume that the clients network will work like most other networks. If the network has had a history of things not working correctly then this is particularly the case.
- Setup everything that you need before you start.
- I have also adjusted by own procedures and will now change the IP address settings from "All Unassigned" to a specific IP address, unless there is a reason not to for that specific client. This setting change shouldn't cause a problem with the vast majority of deployments and will avoid issues like this, particularly if there is a possibility that routing groups may be used in the future.