Wednesday, September 10, 2014

Why you should not try to deal with dates manually

When you're writing a program, you will always have to deal with dates and times at one point of the other.

It could be because you want to setup a scheduled task, or because you want a feature that provides reporting of his last month's activities to the user.

Date handling can be very tricky, and a lot of users still try to handle this manually. If you read The Daily WTF, you'll notice that probably a third of the poor coding examples have something to do with dates.

Microsoft's Azure platform experienced an outage last year due to incorrect handling of leap years in 2013.

This articles is an attempt to list the gotchas you will encounter working with dates. There are two type of tricky aspects in handling dates : the date system itself, with its timezone, leap years and daylight saving time; and the technical aspect, that is how our computer systems manage dates.

Timezones and daylight saving time

Aaah timezones... certainly easy to compute, right ? Just a signed int to indicate how many hours from UTC to store in the database.

But wait, did you know that some timezones have 30 minutes offset as well ? Check out Iran (UTC+4:30) or India (UTC+5:30).

Countries also change timezones from time to time, usually for economical or political purpose, that is to get closer to a partner country or to show distance with a neighbour a bit too invasive. Samao switched timezone 3 years ago to get closer to Australia.

What about daylight saving time ? Did you know that some country change the DST at the last minute ? For example, Morocco has already done this quite a couple of times, to prepare for the religious month of Ramaddan. At that time, all our clocks on Windows were 1 hour late... If you run a calendar app and provide notification services, your Moroccan users probably received the notifications one hour too late during those days.

Leap years

Enough about country-specific aspects, what about universal aspects of our calendar, such as leap years ? In those years, the year counts 366 days instead of the usual 365. Here's how to know if you're in a leap year according to Wikipedia :
if (year is not divisible by 4) then (it is a common year) 
elseif (year is not divisible by 100) then (it is a leap year) 
elseif (year is not divisible by 400) then (it is a common year) 
else (it is a leap year)
If Microsoft made the mistake, we can expect others to do it as well.

Now, let us say you have circumvented the problem by using a reliable date library (more on that later) and good unit testing. You're not done yet because our computer systems make date management even trickier...

Trusting the user's clock

If you're developing a rich web application, a mobile application or an old school desktop application, you will have to deal with the user's clock.

That clock can be improperly set, and your time-sensitive operations might fail because of this. Let us say you have a javascript application that can ask a server for data at a certain time. To stay in the calendar example, let us imagine that the client code retrieves a list of events for a given set of start and end date.

If you client's clock is wrong, you will end up requesting the list of events for yesterday when the user wants the list for tomorrow.

One very time-sensitive type of operations is authentication. A lot of authentication protocols use timestamps (OAuth, Kerberos for example) and the authentication will fail if the client's clock is set wrong.

Parsing and printing dates

Did you write any application that does not either parse a date or prints somewhere (screen, name of a file, a web-service) ?

Try as much as possible to avoid parsing dates that are locale-dependent. Something that is written "Tuesday, September 3rd" might be written "Mardi 3 Septembre" on another machine.

Also, if you're displaying dates in a user interface, take into account that the length of month names depend on the language : July and Juillet are the same month but do not have the same length. Beware of text that does not fit and overflows or becomes partly hidden.

When parsing/printing a date, do not forget the TimeZone, otherwise you'll be off by several hours.

My recommendations :
  • Use ms or ns since epoch when for technical purposes (web-services, storing in a file etc)
  • When you still want the date to be readable by the user, use the ISO 8601 format that is easier to parse and alphabetically sorted


Who will save us ?

In Java, the java.util.Date and java.util.Calendar API is good enough for good for most simple uses. The Joda time library gives easiest access to most date and time-related operations.

The "Why Joda Time" section of the above link summarizes the advantages of this library as follows :

  • Easy to Use. Calendar makes accessing 'normal' dates difficult, due to the lack of simple methods. Joda-Time has straightforward field accessors such as getYear() or getDayOfWeek().
  • Easy to Extend. The JDK supports multiple calendar systems via subclasses of Calendar. This is clunky, and in practice it is very difficult to write another calendar system. Joda-Time supports multiple calendar systems via a pluggable system based on the Chronology class.
  • Comprehensive Feature Set. The library is intended to provide all the functionality that is required for date-time calculations. It already provides out-of-the-box features, such as support for oddball date formats, which are difficult to replicate with the JDK.
  • Up-to-date Time Zone calculations. The time zone implementation is based on the public tz database, which is updated several times a year. New Joda-Time releases incorporate all changes made to this database. Should the changes be needed earlier, manually updating the zone data is easy.
  • Calendar support. The library currently provides 8 calendar systems. More will be added in the future.
  • Easy interoperability. The library internally uses a millisecond instant which is identical to the JDK and similar to other common time representations. This makes interoperability easy, and Joda-Time comes with out-of-the-box JDK interoperability.
  • Better Performance Characteristics. Calendar has strange performance characteristics as it recalculates fields at unexpected moments. Joda-Time does only the minimal calculation for the field that is being accessed.
  • Good Test Coverage. Joda-Time has a comprehensive set of developer tests, providing assurance of the library's quality.
  • Complete Documentation. There is a full User Guide which provides an overview and covers common usage scenarios. The javadoc is extremely detailed and covers the rest of the API.
  • Maturity. The library has been under active development since 2002. Although it continues to be improved with the addition of new features and bug-fixes, it is a mature and reliable code base. A number of related projects are now available.
  • Open Source. Joda-Time is licenced under the business friendly Apache License Version 2.0.

Note that since Java 8, a new Date and Time API has been introduced in the JDK, and its creation involved the author of the Joda Time library.

No comments:

Post a Comment