A while back I posted about troubleshooting a problem where a customer had seen a home grown application was not working as expected. The app was designed to run in a web site which the user would connect to from Internet Explorer on a client. The HTTP connection was secured by SSL.
The problem symptom was a 20 second delay when you first connected to the site before the client would see the web site “draw” after the client received the data. If you closed IE and then reconnected to the web site after that first delayed connection then the subsequent connections would not see a delay.
I went over that issue in detail in this blog post: http://blogs.technet.com/ad/archive/2007/01/03/enough-with-the-delays-already.aspx
Well, we recently saw an issue that, on the face of it, was the exact same thing. Unfortunately, the resolution we ultimately used for the prior case did not resolve the new one. That resolution was to check for permissions insufficient to allow access to folders or files located in the local certificate store(s). Access to those certificates are necessary for SSL session negotiation, even if they are not used for that session.
Our first data gathering method in the last issue was the first thing to do here as well. And a good thing it was we did it.
In the new scenario the IE client connected to an application running in an IIS application pool, which in turn queried Active Directory for some information. So there were at least three computers involved: IE client, IIS server, Active Directory domain controller.
An additional point of data was that the IIS server was a member of an Active Directory domain-in a forest we’ll call brownies.com, and the Active Directory which the application was intended to query was in a different Forest which we’ll call cookies.net.
In a network trace, we saw that the time delay seemed to be occurring during the traffic below:
No. Time Source Destination Protocol Info
65 15:18:48.771242 10.234.32.51 10.234.32.22 DNS Standard query A SugarDC1.chocolate.brownies.com
68 15:18:48.771242 10.234.32.22 10.234.32.51 DNS Standard query response A 10.248.175.4
144 15:18:54.755617 10.234.32.51 10.234.32.22 DNS Standard query SRV _ldap._tcp.OutofSight._sites.luvme.some.cookies.net
150 15:18:55.755617 10.234.32.51 10.248.175.4 DNS Standard query SRV _ldap._tcp.OutofSight._sites.luvme.some.cookies.net
157 15:18:56.755617 10.234.32.51 10.234.32.22 DNS Standard query SRV _ldap._tcp.OutofSight._sites.luvme.some.cookies.net
168 15:18:58.755617 10.234.32.51 10.234.32.22 DNS Standard query SRV _ldap._tcp.OutofSight._sites.luvme.some.cookies.net
169 15:18:58.755617 10.234.32.51 10.248.175.4 DNS Standard query SRV _ldap._tcp.OutofSight._sites.luvme.some.cookies.net
196 15:19:02.755617 10.234.32.51 10.234.32.22 DNS Standard query SRV _ldap._tcp.OutofSight._sites.luvme.some.cookies.net
197 15:19:02.755617 10.234.32.51 10.248.175.4 DNS Standard query SRV _ldap._tcp.OutofSight._sites.luvme.some.cookies.net
207 15:19:04.396242 10.234.32.51 10.234.32.22 DNS Standard query A OreoDC1.chocolate.brownies.com
208 15:19:04.396242 10.234.32.22 10.234.32.51
DNS Standard query response, No such name
209 15:19:04.396242 10.234.32.51 10.234.32.22 DNS Standard query A OreoDC1.brownies.com
210 15:19:04.396242 10.234.32.22 10.234.32.51 DNS Standard query response A 10.248.191.110
297 15:19:09.771242 10.234.32.22 10.234.32.51 DNS Standard query response, Server failure
309 15:19:11.474367 10.248.175.4 10.234.32.51 DNS Standard query response, Server failure
The above is interesting since we can see the duration of this traffic is 22 seconds (other traces reproducing the problem displayed a similar delay). No significant delay was present in the rest of the traffic this scenario generated.
But look at what’s happening in the above network trace excerpt. We start with a query for a local (in the forest the IIS server resides in, a.k.a. brownies.com), then begin querying for an LDAP specific SRV record to locate an LDAP server which could answer LDAP queries for the luvme.some.cookies.net directory. But there is no reply with a server to go to for this. Eventually the traffic is sent to a domain controller in the luvme.some.cookies.net domain after all, but this is the result of an entry in a Host file resident on the IIS server.
Why is this a scenario worth mentioning to you all? So that you can know what to expect in similar scenarios. If the server sending an LDAP query is a member of an Active Directory domain then it will send LDAP SRV queries in order to locate an LDAP server to service the requests it needs to make. The server knows of LDAP SRVs since the DC Locator code on it commonly uses similar behavior to locate a domain controller in its own domain for LDAP requests.
And if it cannot locate a domain controller in this method, it will fall back to whatever it has locally cached for that domain name despite the fact that the server in question is not verifiably an LDAP server.
How did we remove this delay? By making sure DNS could resolve the SRV query. This could be done any number of ways-stub zone, secondary zone-but we chose to add a forwarder on the DNS server which the IIS server was configured to look to for primary DNS.
A final conclusion we could make with this issue was to never assume that the same symptom is the result of the same cause. Let the facts bear things out, as we did in the network traces we gathered.
Until next time folks, take care out there!