Saturday, September 20, 2014

Cloud computing and the tragedy of the commons

It occurred recently to me that Cloud Computing is extremely exposed to an interesting manifestation of the tragedy of the commons.

Whazzat ?

The tragedy of the commons describes a situation where several parties share a limited resource and, by acting according to their self-interest, actually behave contrary to the whole group's best interest. Traditional examples of the tragedy in the commons include people littering in the street, and more generally any misuse of a public good.

When you subscribe to an IAAS, PAAS or SAAS service, you will inevitably share a limited resource, which is the time the company in charge of the service can spend on your specific needs.

And it happens all the time

To take a concrete example, a customer recently complained that a support ticket we file against a third party PAAS was not progressing fast enough. When we asked the support team for an update, they simply answered "Our engineers are working on it, in the meanwhile here's a workaround [...]".

Indeed we had already setup the workaround, and it was satisfying enough, so the ticket resolution was not urgent for us. But my customer found their answer "absolutely inacceptable", even after admitting that they were not impacted since we had setup the workaround. He simply could not accept that we had to wait, and wanted the issue dealt with immediately.

Cloud Computing and the tragedy of the commons

Since the provider company has a limited amount of engineers to dispatch on problems (not to mention that engineers are not interchangeable), they usually try to deal with tickets that have high impact first, this is usually :
  • Tickets that severely prevent the service to work
  • Tickets that impact a lot of customers
  • Tickets that, when solved, could help sign an interesting contract

Our issue was none of those, and indeed if, as customers, we has a severe blocking issue, I would have preferred they fix it rather than some small issue for which a workaround exist.

Nonetheless, if you remain focused too hard on the problem at hand, and forget the big picture, it is easy to get carried away and demand that you be serviced as you think you're entitled to. Even though that's useless, not efficient, and whatever else.

This issue with software support was here way before Cloud Computing : lots of people use Microsoft Office and if you contact the Office support it makes sense to expect use cases more grave than yours to be fixed first by Microsoft.

But Cloud Computing makes that issue way worse, for two reasons :
  1. There are more shared resources than before : Sure, when you installed Office, you shared this piece of software with millions of other users around the world. But you installed MS Office on workstations that you owned and managed. And the macros you build on MS office where ones your developped and managed as well. Nowadays all this is more and more externalized, and becomes a common resource that you must share.
  2. It is harder to work around the issue by yourself : because you often do not master the systems on which the faulty service is built, it is extremely hard to come up with a dirty, hopefully temporary fix. As a result a lot more issues need to go through the support process. 

I do not believe there is any solution for this. Even more, I do not think we should be looking for a solution. All the point about As-A-Service resources is that you get a better service for a cheaper price. It is obvious that you cannot expect it to be bespoke as well.

Sunday, September 14, 2014

Europe Assistance's poor service

I always regarded the insurance packages provided with credit cards or flight tickets as pure scams that nobody would ever get to work.

However I own a Visa Premier debit card since it was the cheapest my bank was offering at the time, and I was forced to use their medical assistance service once.

While I eventually got reimbursed, the whole process took so much time, patience, and resistance to incompetence that I feel the need to write it down somewhere.

At least in France, Visa Premier's medical assistance service is handled by Europe Assistance. If you are a customer of Europe Assistance or the owner of a Visa card, this is a warning. I hope you will not experience the same service as I did. If by any chance someone at Europe Assistance stumbles on this post, please do something to improve your customer service. You are dealing with people who need help (your company name should be a hint) and you cannot have such a subpar communication process.

Agents do not read the text of filed requests

Getting reimbursed for my fiancée's medical expenses took us 46 days, during which I had several contacts over email and phone with Europe Assistance's agents.

All of them were very polite, but of a limited professionalism. While I understand that on the phone agents do not have the time to review all the information of a ticket, it should be obvious that when they're using asynchronous communication such as emails they should check the request history first.

In our case, it was clear that the agents did not bother to check first. While most of the people they deal with is under French healthcare, it is not my fiancée's case. I had to explain this several times to agents who expected me to produce a proof of reimbursement from the French healthcare. Almost every of my early interactions started with "you need to provide a proof of reimbursement", and then I had to explain that it was neither possible nor required.

The same thing happened with the documents that we were asked to submit, namely a copy of our plane ticket and a copy of my  fiancée's passport. I sent both and then called to check that they were received. And the conversation goes :

Agent - Yes we received you plane ticket, we will start the verification process.
Me - OK... wait, you have received the plane ticket AND the passport right ?
Agent - No only the plane ticket, you need to re-send the passport

Tell me Europe Assistance, how hard is it to show on your agents' computer a small box explaining what is the next step and what is expected from the customer ?

Europe assistance does not understand what asynchronous communication means

I mentioned later that I emailed them, then called them. That's how you must work with them. Remember 1995 when you called your mom to tell her to check her emails ? Europe Assistance is blocked on this year...

There is absolutely no way to confirm that the email was received... in 2014 it seems unbelievable to me that their ticketing system does not automatically confirm that the email was received, but even worst : I sent more than 10 emails ending with something like "please acknowledge reception of this mail". I know that the emails arrived, because I received an answer a few days later, but never, not only once, did an agent bother to reply a one-liner such as "thank you for your message, we will process it in the next few days".

Seriously, when you use asynchronous communication, please provide peace of mind to your customers by acknowledging their messages.

You might wonder why I am so fixated on getting my emails acknowledged, but here's the reason...

Emails with big attachments are "lost"

Europe Assistance's mail servers have a ridiculous limit on attachment size. I do not remember the exact limit but nothing more than a couple megabytes could be sent. When you know that they ask for the full copy of a passport (yes, all pages, including the empty ones), imagine the consequences. I had to send 40 mails, one per page, to make sure everything reaches this seemingly impermeable wall.

All email services impose a limit on attachment size, but it is customary when an email is too big to warn the sender by replying with a warning.

Europe Assistance seems to have never heard of this common sense solution. They simply drop the email in some black hole, and nobody ever hears of it, until you call them only to learn that they never received it.

How is that decent customer service ? How hard would it be to assist the user into sending his documents to their services ? I am a software engineer and when something does not work (like an email that never reaches its destination) I can guess the reason and it is easy enough for me to resize an image, but is it the case for all their audience ?

They do not push the process forward, you have to call them for that

This really drives me crazy. Their support system is here for one thing and one thing only : follow the process that they themselves defined.

During my phone exchanges with their agents, it happened twice that the process was stalled for no apparent reason.

Me - "We have completed the process, sent all the documents, but we are still waiting for the payment"
Agent - "Ah yes your request is here, let me process it and get back to you"

Wait, we have been waiting for two weeks while our request was just sitting on your (virtual) desk and you were not processing it ? What if I hadn't called ?

They are unable to process a complaint

At some point, I complained on twitter, curious to know if there would be a response, not hoping much from a company that still does not master email.
So I was amazed to receive an answer a few hours later :

After some complications because their community manager does not know that he needs to follow someone so that the person can DM, I explained my issue, hoping that it would get escalated and that things would get smoother.

I then got the dreaded answer "I checked your file, and you just need to provide the reimbursement proof from the French healthcare service". Aaaah thank you so much Europ Assistance, I see that you reviewed my situation carefully, especially my messages explaining that my fiancée was not registered under French healthcare.

I will never sign up for Europe Assistance again

I am lucky that my credit card's assurance was provided by Europe Assistance, because now I know that I will never decide to use their services for me or anyone who is dear to me. I seriously hope that they will improve their service, because my experience with them did not look professional at all.

Wednesday, September 10, 2014

Why you should not try to deal with dates manually

When you're writing a program, you will always have to deal with dates and times at one point of the other.

It could be because you want to setup a scheduled task, or because you want a feature that provides reporting of his last month's activities to the user.

Date handling can be very tricky, and a lot of users still try to handle this manually. If you read The Daily WTF, you'll notice that probably a third of the poor coding examples have something to do with dates.

Microsoft's Azure platform experienced an outage last year due to incorrect handling of leap years in 2013.

This articles is an attempt to list the gotchas you will encounter working with dates. There are two type of tricky aspects in handling dates : the date system itself, with its timezone, leap years and daylight saving time; and the technical aspect, that is how our computer systems manage dates.

Timezones and daylight saving time

Aaah timezones... certainly easy to compute, right ? Just a signed int to indicate how many hours from UTC to store in the database.

But wait, did you know that some timezones have 30 minutes offset as well ? Check out Iran (UTC+4:30) or India (UTC+5:30).

Countries also change timezones from time to time, usually for economical or political purpose, that is to get closer to a partner country or to show distance with a neighbour a bit too invasive. Samao switched timezone 3 years ago to get closer to Australia.

What about daylight saving time ? Did you know that some country change the DST at the last minute ? For example, Morocco has already done this quite a couple of times, to prepare for the religious month of Ramaddan. At that time, all our clocks on Windows were 1 hour late... If you run a calendar app and provide notification services, your Moroccan users probably received the notifications one hour too late during those days.

Leap years

Enough about country-specific aspects, what about universal aspects of our calendar, such as leap years ? In those years, the year counts 366 days instead of the usual 365. Here's how to know if you're in a leap year according to Wikipedia :
if (year is not divisible by 4) then (it is a common year) 
elseif (year is not divisible by 100) then (it is a leap year) 
elseif (year is not divisible by 400) then (it is a common year) 
else (it is a leap year)
If Microsoft made the mistake, we can expect others to do it as well.

Now, let us say you have circumvented the problem by using a reliable date library (more on that later) and good unit testing. You're not done yet because our computer systems make date management even trickier...

Trusting the user's clock

If you're developing a rich web application, a mobile application or an old school desktop application, you will have to deal with the user's clock.

That clock can be improperly set, and your time-sensitive operations might fail because of this. Let us say you have a javascript application that can ask a server for data at a certain time. To stay in the calendar example, let us imagine that the client code retrieves a list of events for a given set of start and end date.

If you client's clock is wrong, you will end up requesting the list of events for yesterday when the user wants the list for tomorrow.

One very time-sensitive type of operations is authentication. A lot of authentication protocols use timestamps (OAuth, Kerberos for example) and the authentication will fail if the client's clock is set wrong.

Parsing and printing dates

Did you write any application that does not either parse a date or prints somewhere (screen, name of a file, a web-service) ?

Try as much as possible to avoid parsing dates that are locale-dependent. Something that is written "Tuesday, September 3rd" might be written "Mardi 3 Septembre" on another machine.

Also, if you're displaying dates in a user interface, take into account that the length of month names depend on the language : July and Juillet are the same month but do not have the same length. Beware of text that does not fit and overflows or becomes partly hidden.

When parsing/printing a date, do not forget the TimeZone, otherwise you'll be off by several hours.

My recommendations :
  • Use ms or ns since epoch when for technical purposes (web-services, storing in a file etc)
  • When you still want the date to be readable by the user, use the ISO 8601 format that is easier to parse and alphabetically sorted

Who will save us ?

In Java, the java.util.Date and java.util.Calendar API is good enough for good for most simple uses. The Joda time library gives easiest access to most date and time-related operations.

The "Why Joda Time" section of the above link summarizes the advantages of this library as follows :

  • Easy to Use. Calendar makes accessing 'normal' dates difficult, due to the lack of simple methods. Joda-Time has straightforward field accessors such as getYear() or getDayOfWeek().
  • Easy to Extend. The JDK supports multiple calendar systems via subclasses of Calendar. This is clunky, and in practice it is very difficult to write another calendar system. Joda-Time supports multiple calendar systems via a pluggable system based on the Chronology class.
  • Comprehensive Feature Set. The library is intended to provide all the functionality that is required for date-time calculations. It already provides out-of-the-box features, such as support for oddball date formats, which are difficult to replicate with the JDK.
  • Up-to-date Time Zone calculations. The time zone implementation is based on the public tz database, which is updated several times a year. New Joda-Time releases incorporate all changes made to this database. Should the changes be needed earlier, manually updating the zone data is easy.
  • Calendar support. The library currently provides 8 calendar systems. More will be added in the future.
  • Easy interoperability. The library internally uses a millisecond instant which is identical to the JDK and similar to other common time representations. This makes interoperability easy, and Joda-Time comes with out-of-the-box JDK interoperability.
  • Better Performance Characteristics. Calendar has strange performance characteristics as it recalculates fields at unexpected moments. Joda-Time does only the minimal calculation for the field that is being accessed.
  • Good Test Coverage. Joda-Time has a comprehensive set of developer tests, providing assurance of the library's quality.
  • Complete Documentation. There is a full User Guide which provides an overview and covers common usage scenarios. The javadoc is extremely detailed and covers the rest of the API.
  • Maturity. The library has been under active development since 2002. Although it continues to be improved with the addition of new features and bug-fixes, it is a mature and reliable code base. A number of related projects are now available.
  • Open Source. Joda-Time is licenced under the business friendly Apache License Version 2.0.

Note that since Java 8, a new Date and Time API has been introduced in the JDK, and its creation involved the author of the Joda Time library.