Wednesday, April 04, 2007

How I spend my days

I'm sorry to report that I did not have fun in the sun today: there was no sunshine. 3°C, cloudy with intermittent rain and moderately high winds. Bah.

Instead, I've had an absolutely fascinating day at work on the database.

The fun started when I noticed that an address book exported from Outlook 6 differs from an address book exported from Outlook 5. Microsoft changed the rules yet again, the swine! The difference is simple but crucial: Outlook 5 used the tab character to separate fields in the exported data. This was a very sensible way to structure the data, because one cannot normally enter a tab character into a text stream: the computer understands it as the command "jump to the next field please." Outlook 6 has thrown that out, to Hell with sensible: it uses the semicolon ; to separate fields. This is an absurd change to have made, because the world is full of semicolons; they occur all the time; there can be several in a single sentence.

How does the importing software know the difference between a content semicolon in the middle of a sentence, and a separator semicolon that marks the division between two fields (e.g. First name;Last name). As a test I exported a few addresses containing random semicolons ("Ud;ge") from Outlook 6 and opened the file in Excel; to my surprise it worked perfectly, Excel was able to tell the difference between content and separator.

Obviously there must be a difference in the file. I opened it in a text editor, and discovered how it works: Microsoft solved the problem that they had just created by inventing a second kind of separator: the pair of characters ;" coming immediately together marks the beginning of a block of text that should be left untreated until the pair "; coming immediately together marks the end of the block. As my experiment with Excel shows, it works fine with normal content.

But what about abnormal content, e.g. He said "Never"; they walked out? The computer would take "; for a marker, and assume that what follows ("they walked out") is the next field, and so would arrive at the end of the line with a field left over and nowhere to put it, and the import would crash and burn. (Excel does indeed fail when importing such data.)

I thought for a long while about how to solve this problem, then decided not to solve it: why should I do Microsoft's homework for them? I implemented the check for ;" and "; pairs, and left it at that. If it crashes, it crashes. I shall tell any complaining users to try the file in Excel and call me back if-and-only-if it works correctly there. But I shall rewrite my database's export routines to solve this issue correctly.

Memo to Bill Gates: Microsoft seems to be running out of common sense; my e-mail address is at the top left; my rates are very reasonable.

Labels: , , ,


Blogger JoeinVegas said...

You know Microsoft makes changes just to make changes? And to keep you buying the latest version of THEIR stuff, making it hard for competition.

April 6, 2007 at 5:26:00 PM GMT+2  

Post a Comment

<< Home