Wednesday, November 21, 2007

On the road again

Rolling home in the afternoon on a cool and misty day, under a seamless, colour-suppressing light overcast. There are some patches of snow on the ground, in the shade of trees and buildings and along the thicker limbs of particularly stable trees. Munich was almost bare, only a few shreds and patches remained. The rivers are flowing fast and high, presumably the runoff from this recent snowfall.

It is cold, though: ponds along the way are frozen over.

Spent the morning with Georgette and colleagues (though not my unnamed partner, who is in Spain at a trade fair), discussing the state of the database and a few future improvements.

In geekish news the purpose of yesterday's conference was to present the forthcoming version of the database toolkit, which is going to be very exciting when it gets formally released next Spring. (Non-geeks are encouraged to stop reading at this point.)

It is already very stable: although what we saw was officially beta-test software, it didn't crash once during the demo. The engine has been substantially rewritten, the indexing has been separated out of the datafile and completely rewritten using newer techniques, and the application now runs natively multi-threaded on multiple-core processors; together these changes bring astonishing performance improvements.

The lead developer spent a quarter-hour demonstrating how to optimize databases using the new indexing: text search in 100 million records using old (B-Tree) indexing: 34 seconds; recreate the index as a Cluster, unload the cache, same search was done in 0.6 seconds. (Sound of 80 jaws clattering on the floor.) And that on a normal MacBook! not even the Pro model. So you can imagine what a server with a fast disk-array and 16Gb memory would be able to do.

The best improvement in indexing is that plain-text fields are now fully indexed and searchable: the database can generate indexes on every single word in a 2Mb text field. This allows clever, user-pleasing tricks like live real-time searching. To demonstrate this, they imported some 10 gigabytes of data, three million records' worth, from the various Wikipedias into text fields, then searched for "macintosh." The database engine filtered the records in real-time: type "m", 3m records; type "a" to give "ma", 2m records; type "c" to give "mac", 1m records; type "i" to give "maci", 600k records; and so on. Very impressive.

But possibly the biggest news is that this version finally has a built-in SQL engine, it's no longer a plug-in. SQL searches now run as fast as native-language queries. Watching this part demonstrated took me back twenty years to my first computer experiences, using SQL databases on mainframes and large UNIX systems. The language hasn't changed a bit (ha).

Heady stuff. It's going to be such fun to get my hands on this software.

Twenty-one down, nine to go.

Labels: , , ,

3 Comments:

Blogger JoeinVegas said...

But a Mac server? come on . . .

November 21, 2007 at 5:28:00 p.m. GMT+1  
Blogger Udge said...

Sure, why not? They exist and are fast, stable and very high value-for-money; but the database engine runs on windows too.

November 21, 2007 at 7:56:00 p.m. GMT+1  
Anonymous Anonymous said...

I go crazy for your descriptions of the weather. I still remember the one that you wrote from Canada a month ago.

November 24, 2007 at 8:20:00 p.m. GMT+1  

Post a Comment

<< Home