Sunday, January 11, 2009

Reply to All and Message Recall cause massive Mail FAIL

There is an interesting story reported on Slashdot about a mail storm debacle that brought the US State Departments MS Exchange infrastructure (branded OpenNet eMail) to it's knees. The problem was not simply by sending the message to many thousands of people - any mail system could handle that. The problem was that a significant minority of the recipients chose to "Reply to all" requesting their name be removed from the mailing list. This exponentially increases the number of messages flowing through the system which is further exacerbated by having "receipt request" specified. If that weren't bad enough some of the users then started trying to recall their message.

What many folks don't realise is that Message Recall is actually rather inaccurately named as a message is not recalled, it is deleted from the recipients mail file. And how does it know to delete the message? Well another message is sent to all the recipients with that instruction! Further doubling the workload. In Lotus Domino these recall messages are processed by the mail router so it is very efficient. However, in an Exchange environment the requests are processed by the Outlook mail client - so the request will obviously fail if you are using another mail client or an old version of Outlook. This is compounded by the fact that the recall request gets delivered to the users highlighting that the sender is trying to recall the message (which will of course draw additional interest to the original message).

Here is an example I received a few weeks ago.


My surprise about this issue is not that a load of under trained users/politicians caused the problem by the inappropriate use of "Reply to All". It is that the US State Department doesn't have a more efficient method of making an announcement to all it's staff than a blunderbuss mailing. Of course there are many ways to mitigate the problems that "Reply to All" can potentially cause. Putting the mailing list in the BCC rather than the CC field will help a lot (and be more discrete). Alternatively you can restrict access to mail groups, create a maximum number of recipients allowed or cause users to be prompted/warned if they try to send to too many users. But these are really attempts to fix technology when the fault lies in the culture.

This is made all the more ironic because the blanket email sent round to everyone was apparently a warning about using "Reply to All".

So what is the alternative?

Well firstly we need to move away from the 1980's notion that eMail is the only way to collaborate and communicate with people. It would be far more efficient to post the communication on their corporate intranet as a news story. If you required the ability for users to be able to comment on the communication then it might be more appropriate to post it on a corporate blog. If your organisation doesn't have a single blog that all users access regularly then it might be appropriate to email the communication to business unit leaders and ask them to post in in their local Team Rooms/Quickr Places/Sharepoint Sites/Public Folders etc. If you need to ensure that users are informed about the policy change immediately then it would be better to send a broadcast message using realtime collaboration tools such as Sametime. Better still, a combination of these capabilities to ensure that users get to view the communication using their preferred client, for example an RSS feed or a News Reader.

As my frolleague John Wylie points out we should not blame the users. That is the typical IT department response ("The system would work fine if it weren't for the users"). Users should not be put in a position where their actions can cause such a catastrophe. And the way to do that is to collaborate with them more intelligently.

Further reading for Domino message recall here
Further reading for Exchange/Outlook message recall here
Mail Fail Image from Crunchgear

2 comments:

wild bill said...

Another issue is with exchange itself. Since its based on the venerable Jet engine, which itself is based on MS Access, throughput and performance have always been issues.

Example: Huge consumer electronics firm in the Netherlands. In 1997 we had to design an incoming SMTP hub that would cope with 4m messages an hour. Which we did with a single domino server. Okay - it was a HUGE domino server, but it worked.

Roll on 10 years, this company is moving to Exchange. And they just cant believe the number of messages that are flowing through the system, and cant believe that a single domino server could cope with 4m messages an hour.

Much gnashing of teeth, contractual sulks, massive downtime.

10 years on, and Domino still outperforms Exchange on raw message throughput.

---* Bill

Ian said...

Haha. You said "US State Department" and "efficient" in the same sentence! :)