Today is a rather unusual day. We will leave behind the comfort of the UNIX world; the thoughts of the comparatively well-engineered groovy; the joy of the comparative minimalism of Enterprise Java Beans. Today we take a trip to Redmond to gaze upon the monstrosity that is Microsoft Exchange, Microsoft’s PIM (email, calendar, and contacts) server.
Now, we’re not going to talk about actually setting up or running Exchange. This is something I have no experience with and certainly never will; there are many employment opportunities available that are less depressing and more fulfilling, such as decapitating rabbits with scissors or installing road signals for bicyclists.
Instead, we’re going to be talking about interfacing with third-party Exchange servers, including “Office 365”.
Nobody knows what anything is called
Terminology in the Exchange ecosystem is an utter catastrophe. Various things
named by marketers have been encoded into programmatic interfaces over time,
all the while marketers continue renaming those things, resulting in a
stratification of different names for the same concept. For example, the login
process for Microsoft-hosted accounts involves going through web pages on a
very large number of domains, some under
live.com, some under
microsoft.com, some under
In some cases, very different concepts have been “unified” under one name by
marketing. The most egregious example is “Office 365”, which simultaneously
refers to three radically different products: the free email accounts under
hotmail.com and friends; the web version of Office without email; and
business plans with both the Office products and Exchange hosted in Azure.
As a result, communicating about these things within one’s own team requires setting out specific terminology to use. For example, in my team, we refer to the authentication system used by the free email domains under the older name “Windows Live”; the two “Office 365” offerings that include Exchange in some form are “Hotmail” and “Exchange Online”.
Of course, customers don’t know these conventions, so communicating with customers about how your product works with their Exchange account is extremely challenging. It is not uncommon to receive a customer question where every other noun might as well be replaced with smurf and then either try to guess which smurf they’re talking about or somehow explain to them that these things Microsoft has been saying are the same aren’t.
The confusion even extends into Microsoft themselves. Technical documentation often uses older names for things, simply because those names have been baked into APIs. Even the marketing department itself is completely clueless as to what they’re naming things. The web app distributed with Exchange has traditionally been called the OWA, or Outlook Web Access (though many people think it means Outlook Web App, which is probably a better name). In their brilliance, marketing named a native iOS app for accessing Exchange OWA for Devices, despite nothing but “Outlook” from the base name being applicable. Later, they proceeded to rename (or try to, it hasn’t stuck) the OWA to OotW, or Outlook on the Web. OWA for Devices still stands with that name.
There are no fewer than four protocols for accessing data in Exchange, none of which are universally applicable.
MAPI is essentially the “core” protocol, essentially a COM protocol for accessing a glorified two-level key-value store. Apparently there is a way to expose this COM protocol over HTTP, but this is not generally available, which is likely for the best. Despite this, MAPI is really the sanest of the quartet.
Despite MAPI not being usable, knowing about how it works is important, as all the other protocols are incomplete abstractions that allow some direct MAPI functionality as an escape hatch. At a high level, MAPI defines a directory tree; each directory contains a set of items, and each item a set of properties identified by 16-bit integers. In true Microsoft fashion, directories are over-engineered to allow splicing alternate implementations in, and entries are addressed by huge, baroque ids.
“Office 365” provides a “REST API”. As of the last time I tried to use it, it is only available for Exchange Online and not Hotmail.
The REST API is quite peculiar. It seemingly uses an undocumented, custom query
language for requesting data. These queries must be passed in the URL string,
which gets awkward since the language contains mandatory spaces and other
characters that need to be URL-encoded. Additionally, by default the API only
returns a small and arbitrary subset of data by default; for example, contacts
include the first but not the second mobile phone number. Additional properties
can be requested by modifying the query to specify MAPI properties by id.
Obtaining all properties results in an extremely long query string, which
written the obvious way produces a URL longer than Microsoft’s server is
willing to accept. Some code-golfing can get it under the limit, by using (if I
recall correctly) a tab character instead of space, since the former can be put
in the URL directly and the server accepts it, whereas the space must become
%20 and thus counts as 3 characters for the length limit.
In any case, the REST API is utterly unusable, not only due to its limitations, but because it is not even available for the majority of users.
The next is EAS, or Exchange ActiveSync. EAS is more interesting for reasons that have nothing to do with the protocol’s mode of operation.
Firstly, there are claims that Microsoft has some kind of submarine patent on EAS and wants implementors (even of clients) to pay royalties, despite there being nothing apparent in the stack that both Microsoft owns and is patentable, even by the lax standards of the late 90’s and early 00’s. This has not prevented the existence of unlicensed open source implementations, such as GNOME’s Evolution, which has a wonderful diatribe on this topic.
The other interesting thing is the choice of data format. EAS came about in the days that the nearly entire world had a shared delusion that XML was the bee’s knees, so obviously everything should be sent in XML rather than a custom binary format. But EAS is intended for use by the very bandwidth-constrained mobile devices of the early 00’s, a firm contradiction to using what is quite likely the least space-efficient serious serialisation format ever conceived.
The natural solution was to make a custom binary format which is homomorphic to XML. And thus WBXML, or WAP [Wireless Application Protocol] Binary XML, was born. WBXML has a number of features not found in vanilla XML, such as being able to switch between a couple dozen legacy character encodings (in addition to UTF-8 and UTF-16) mid-stream.
If you’re willing to look past the (probably marginal) legal risk and can get a WBXML parser/generator set up, you’ll find that it’s not the most convenient protocol. Instead of making the data store directly addressable (e.g., have operations like “delete this item” or “update this field of this item”), all mutations are done through a “sync” protocol, in which the client sends a delta of its (presumed) local replica and the server determines how to apply those changes and resolve conflicts, then sends the results back to the client.
bright less dim side, EAS is available the most widely of the four
The final contender is EWS, or Exchange Web Services. This is a SOAP API; in other words, lots of XML and envelopes. Despite that, it’s largely well-designed enough to be palatable. It presents a fairly straight-forward directly-addressable item store.
It does have its quirks though. Like the REST API, it omits some fields in its model, though the set omitted is much smaller. Additionally, some fields are present in the EWS model but are seemingly not connected to anything in the back-end. For example, the yomi (Japanese name pronunciation) fields are recognised by EWS, but are never returned in responses and are thrown away on write. Like the REST API, there is a mechanism to request MAPI fields by id to get around this problem; unlike the REST API, one doesn’t need to fight an undocumented query language or URL length limits to do this.
Another oddity is the way item updates are specified. Each item update is a list of changes. A change indicates a field by name or id, plus a full item with exactly that one field set. This makes setting up update requests unnecessarily difficult and bloats the request syntax quite a bit.
EWS is available fairly widely, though not as widely as EAS.
The security story for accessing Exchange is a complete trainwreck on multiple fronts.
First is the obvious question of how to provide authentication to Exchange so you can access the user’s data.
Exchange can be integrated with Active Directory. If you are a member of tha Active Directory domain, using the AD infrastructure is likely the best way to go. But this both presumes the existence of AD and that the administrator of that domain has explicitly configured access for your system. When making an application that acts on behalf of lowly users, this isn’t an option, so we mostly can ignore the Active Directory option.
Another reported option is this WS-Security thing, which I’ll touch on a bit later. It’s ultimately irrelevant here because it requires manual setup on both ends.
The remaining option which is both user-friendly and generally applicable is
(sigh) user name and password authentication, which works everywhere. Remember
the Active Directory integration? For AD-based Exchange instances, this
combination is the same as what the user uses for Active Directory in general;
i.e., this approach has them produce credentials that grant access to literally
everything they control. A depressing number of users are willing to do this,
including one from a particular high-profile
.gov domain which fortunately
does not allow any external access at all.
There are two generally supported ways to pass these credentials to the server.
The first is HTTP
Basic authentication, where you just hand over the email
address and password in cleartext. This isn’t a huge issue if HTTPS is used
correctly (foreshadowing!) since the TLS layer takes care of encryption and
validating that the server is the intended recipient.
The other is NTLM. NTLM is
Digest’s little brother that was dropped on its
head as a baby. The NTLM authentication flow involves a three-round-trip
process in which the client and server exchange a challenge, and then the
password is used to answer that challenge without sending the password itself
over the wire. The hash function used in this process has a colourful history;
particularly, the original version made passwords case-insensitive and
truncated them to 14
Ok, so you have this HTTP authentication mechanism that’s really slow since it
requires re-sending a request three times to go through the authentication
dance. How do you fix it? Issue the client some sort of token they can reuse?
Provide some extra data so the client can preemptively answer further
challenges? The designers of NTLM decided to “fix” it by making the HTTP
connection stateful. Unlike every sane HTTP authentication mechanism, once
NTLM completes on a particular TCP connection, all further HTTP requests on the
same connection implicitly use that authentication and do not include any
Authentication header of their own.
Very, very few HTTP client implementations support this well. Apache HTTP Client supports it, but requires manual bookkeeping to be able to handle multiple authentication contexts.
Another complication of NTLM is that it requires Active Directory users to provide their AD user name and domain rather than their email address. Since many AD domains are not named the same as the organisation’s Internet domain, and some organisations don’t even use the same user name as found in the corresponding part of the email address, one has no choice but to require the user to provide this information themselves. Explaining these concepts to users is extremely challenging.
NTLM might still seem like an improvement since at least the user’s password is
not divulged to the server. Sometimes this is true, but in many cases the POX
Autodiscovery service used with Exchange only supports
This means that such AD-based Exchange setups cause clients to divulge not only
the email address and password, but also the AD user name and domain name, an
even more useful combination than just one pair or the other.
There’s one glimmer of hope:
Office 365 Exchange Online supports OAuth. By
redirecting users to a web flow, you can get a token to use with Exchange
Online without ever needing to learn the user’s password.
Microsoft’s documentation on OAuth points out that it’s been extended and implies that you need to understand these extensions to work with it. Reading that documentation, you learn about their JWT-format access tokens and how standard 3-legged OAuth is extended to a 5-legged system to allow the credential validator to be separate from the access grantor. After finally understanding all that, you realise that none of it affects you and you can treat it like normal 3-legged OAuth.
But not all is so rosy. OAuth is built on Azure Active Directory, which allows customers to federate authentication to their own system. But this is exactly what the 5-legged extension is supposed to solve, right? It should just work, right? Ha!, no. The web flow takes the user to what is essentially a 500 error page that seems to blame the application for not already understanding someone else’s authentication setup.
Of particular interest is the Windows Live authentication system, which is similarly federated into Azure AD. Here, the error page the user gets at least explains in an obtuse way that authentication cannot work because your Azure AD application has not granted Windows Live permission to generate user tokens for its own use. Weird, but it sounds like it’s just a matter of adding that permission in Azure AD. Except there’s no clear way to do this. Even customer support for Azure AD has absolutely no idea how this can be achieved. Thus the Windows Live users (i.e., hotmail.com et al) are relegated to user name and password authentication, leaving OAuth exclusively to business accounts with Exchange Online.
That’s about it for authentication. How about securing the communications themselves? EWS mandates using HTTPS for everything, so it seems everything on this front should be peachy. It’s not.
To call the HTTPS situation “dire” is an understatement. Some people argue that
using a self-signed certificate for Exchange is OK for intranet-only
deployments. This is probably true, but if that’s what you’re going for, don’t
expose your intranet-only deployment to the Internet! Others argue that that
WS-Security thing makes everything HTTPS redundant so there’s no reason to
bother with it. But nobody uses WS-Security; most couldn’t even if they wanted
to. And even if it was used, it doesn’t replace HTTPS, as we’ll see later.
Finally, there’s a bunch of low-cost hosting providers that do awful things
CNAME from one domain to another when the target doesn’t have a
certificate for the former.
Though not a majority, a large portion of Exchange instances have misconfigured SSL. Serving certificates valid for one domain from another. Self-signed certificates. Certificates which expired long ago. In frighteningly many cases, all three at once. The sheer incompetence required to use self-signed certificates but be unable to keep them up-to-date and add all the needed domains is astounding.
Another unsettling trend — not common, but I’ve never seen it anywhere outside an Exchange context — is running vanilla HTTP on port 443. I’m not sure if any client actually supports such a configuration.
To cope with this, quite a few Exchange clients entirely disable certificate checking. Unfortunately, this perpetuates the problem, as people setting up servers with minimal effort find that their awful setup “works”, and so there is one more example the clients need to keep supporting.
Microsoft does not appear to be the origin of a lot of these misconceptions, but it is not blameless, in that it has continued to produce software that by default works with these idiotically broken setups. Even the official EWS SDK for Java unconditionally disables almost all the certificate checks. (I recommend not venturing into other parts of that repository without proper eye protection to prevent you from gouging your own eyes out with your fingernails; it is quite possibly the worst Java code ever written.)
What’s this “WS-Security” thing that keeps coming up? WS-Security is what came about when someone thought “SSL is great and all, but it doesn’t have enough XML”. WS-Security is essentially a system similar to SSL/TLS but it is based on XML documents and only works with XML documents. It’s an order of magnitude slower than TLS and highly error-prone. It’s also completely broken, with many configurations vulnerable to types of padding oracle attacks; i.e., an attacker can decrypt a document by iteratively corrupting it and seeing what the server’s response is.
The challenge is not in the destination, but in the journey
In order to talk to the Exchange server, you need to know where it is. Many
people would probably expect a rule like “The exchange server for
firstname.lastname@example.org is found at
exchange.example.com.”, which is dead simple
for administrators to set up and clients to implement.
Unfortunately, “simple” is frowned upon, especially “dead simple”. What Microsoft has given us more strongly resembles a Vogon bureaucratic process than a network protocol. It is called “Autodiscovery”.
Autodiscovery isn’t unique to EWS. EAS and a number of other Exchange-centric protocols use the system to discover server URLs.
Microsoft has this to say about Autodiscovery: “The Exchange Autodiscover service provides an easy way for your client application to configure itself with minimal user input.” Whoever wrote that either is different from whoever wrote the rest of the page or is deluded beyond hope.
Officially, the first step in Autodiscovery is to ask Active Directory where Exchange is. If you’re not part of Active Directory, tough luck, fall through to the horror that awaits and talk to the Autodiscovery servers.
There are actually two Autodiscovery services in play. The first is POX Autodiscovery, often called just “POX” since that acronym (“Plain Old XML”, as opposed to SOAP) is used virtually nowhere else. POX was originally specified along with Exchange ActiveSync. Thankfully, it does not use WBXML. Strangely, the official documentation on POX on one page describes it as “schemaless”, then a few pages later presents an exhaustive XML schema for the protocol.
The second is SOAP Autodiscovery, or “SAD” for short. SAD is in every way exactly like POX except for the request/response formats. It is unclear why it exists at all.
Both POX and SAD work like this: You submit an authenticated request essentially asking, “Here’s the user’s email address, where’s their EWS server?” The response can be a number of things:
Success, including an EWS server URL, possibly multiple. Go on to test these to see which one(s) work.
Success, but without indicating any EWS server. This usually means the site doesn’t have EWS set up.
An HTTP error.
An HTTP redirect. This includes a non-standard
451 Redirectstatus code. Resubmit the autodiscovery request to that URL.
An HTTP 200, but with a response body indicating an error in one of multiple layers of nested envelopes.
An HTTP 200, but with a response body indicating to redirect to another URL. Resubmit the autodiscovery request to that URL.
An HTTP 200, but with a response body redirecting you to another email address. Throw away everything you’ve done and restart autodiscovery with that email address. What email address or user name to use for continued authentication is unspecified.
This all sounds really awkward, but doable. “But wait,” you might ask, “where are the autodiscovery servers?” A good question. Before the above process, you must first autodiscover the autodiscovery services.
The first phase of meta-autodiscovery is to take the user’s email address and
derive some URLs from the domain part. For example, for
you’d derive the following list (order as suggested by Microsoft):
I’m not sure why this is the suggested order. An overwhelming majority of sites support POX, but only some support SAD, so there is not much reason to check SAD first.
A bigger issue is that the first pair of URLs are on the domain’s general
website. Very few sites actually run Exchange directly on their top-level
domain, so these requests will generally receive strange responses from the WWW
server, such as
415 Method Not Allowed or a
200 OK with an HTML response
body. While handling these cases is mandatory regardless, it makes more sense
to try the specialised
autodiscover.example.com domain first.
Working one’s way through the list is more complicated than it sounds, because
POX and SAD are both allowed to return spurious credential errors. Even if
they weren’t, it’s possible to get a
403 status from the steps where
the WWW server is called. Think makes determining whether a failure is due to a
problem with the site or because the user mistyped their password extremely
The meta-autodiscovery process so far looks like a slightly more complicated version of our “dead simple” idea of discovering Exchange proper. It also takes place entirely over HTTPS, so it’s sound from a security perspective provided the SSL setup isn’t broken and the client in use isn’t broken by default. Obviously, neither point may be left standing.
If all four steps so far fail, you proceed to derive another URL from the
domain part of the email address and send an HTTP
GET to it:
This endpoint is supposed to return some kind of HTTP redirect to a POX server.
Note that this is nothing like a standard-compliant HTTP redirect, since one
takes the target URL and makes a
POST request to it with a body.
If that fails as well, there’s also a step to get a POX URL from a particular SRV record in DNS.
The lack of an
http in the above URL is not an accident: The
request is performed over cleartext HTTP. The DNS step is also insecure unless
the domain in question has DNSSEC and the client validates it. This utterly
defeats the HTTPS used so far. By intercepting one of these redirects, an
attacker can cause a client to send the user’s credentials to an arbitrary
destination. The destination may very well have a valid SSL certificate, but
this guarantees nothing because there is no way to ensure that it’s the
destination that the site actually wanted.
In some cases, you can make an educated guess as to whether it’s safe. For example, if the destination is on a subdomain of the user’s email domain, it’s likely under control of the same organisation or the site has already been compromised.
Otherwise, the best practise recommended by Microsoft is to ask the user if they’re sure they want to accept the redirect. This already sounds like a hard thing for most users to answer properly, but a lot of third-party Exchange hosting providers, such as Rackspace, choose domain names for Exchange that have nothing apparent to do with the hosting provider. There unfortunately doesn’t seem to be any other option, so users end up being asked things like “Hey, we got a redirect to mex09.emailsrvr.com, is it OK to continue?”
This is all pretty complicated even in theory. In practise, it’s even worse.
The autodiscovery system for the Hotmail family of domains is at varying times down, has invalid SSL certificates, or serves responses that don’t conform to the POX autodiscovery schema. An implementation must recognise such email addresses and jump to a preset EWS URL.
A certain domain hosting provider is known to add a DNS SRV record for a different Exchange hosting provider even for customers not using the latter. This particular Exchange hosting provider’s autodiscovery servers are a custom implementation written in PHP which only return EAS URLs. Handling this requires some extra logic to detect when a site has conflicting autodiscovery information, so that if the user mistypes their password, you don’t fall through to the end, see that EWS is not configured, and tell the user that it’s futile.
There’s a lot of other special cases which form a balancing act between helping users who are utterly confused and supporting users which have bizarre setups that simply make them look utterly confused.
The end result is that a robust autodiscovery client implementation is ridiculously complex. It’s hard to adequately express this, but perhaps I can illustrate by the fact that the cyclomatic complexity of the autodiscovery client I authored is greater than that of all of the rest of the Exchange-specific contact synchronisation code it supports, combined.
Tying it all together
Supporting Microsoft Exchange is an utter nightmare. The technical details are so vast that most users are overwhelmed if they ever get past the veneer of marketing. Even that veneer is confusing because it outright lies that different things are the same. Administrators have too many options in server configuration, making it hard for them to know that their setup is “right” and making clients jump through all too many hoops to support everyone. Microsoft Support is utterly useless, as most questions either get defeated by marketing ambiguity or span two distinct systems, a situation their departmental fragmentation cannot cope with.
There’s probably not much to take away from all this, since most people already have an “avoid integrating with Exchange if at all possible” mindset. It would be nice if administrators (especially of hosting providers) paid more attention to getting SSL right, but this won’t happen because clients continue supporting them. It would be nice if people writing clients would make them stricter to flush out awful setups, but this won’t happen because doing so shrinks the market for such clients. It would be nice if Exchange simply ceased to be relevant, but this won’t happen for quite some time.
So many unfulfilled wishes. So much confusion. So much suffering. I envy those who live in the ignorant bliss of not having ever implemented an Exchange integration.