Movable Type and Unicode

Running Movable Type natively in Unicode was not as difficult as I thought but it still required a number of patches to the code.

I have been trying to get Movable Type to run Unicode natively for a while. When Movable Type was upgraded to version 3.3, I saw my chance. This new version has a lot of the needed code for encoding and decoding etc. and made my job much easier than before.

If you remember my previous travails, DBD::mysql module lacked UTF8 support. Almost immediately after my changes, the develper release of DBD::mysql finally included a UTF8 patch. But that was too late for me. Plus I am going to wait for it to be included in a regular release since DBD::mysql is somewhat complicated.

What I did was to set the UTF-8 flag for everything coming out of the database using a wrapper around the DBI module. I used Pavel Kudinov’s code for that, which is given below.

# re-implementation by Pavel Kudinov
# originally from:
package UTF8DBI    ; use base DBI    ;
package UTF8DBI::db; use base DBI::db;
package UTF8DBI::st; use base DBI::st;
sub _utf8_() {
use Encode;
if    (ref $_ eq 'ARRAY'){ &_utf8_() foreach        @$_  }
elsif (ref $_ eq 'HASH' ){ &_utf8_() foreach values %$_  }
else                     {         Encode::_utf8_on($_) };
sub fetch             { return _utf8_ for shift->SUPER::fetch            (@_)  };
sub fetchrow_arrayref { return _utf8_ for shift->SUPER::fetchrow_arrayref(@_)  };
sub fetchrow_hashref  { return _utf8_ for shift->SUPER::fetchrow_hashref (@_)  };
sub fetchall_arrayref { return _utf8_ for shift->SUPER::fetchall_arrayref(@_)  };
sub fetchall_hashref  { return _utf8_ for shift->SUPER::fetchall_hashref (@_)  };
sub fetchcol_arrayref { return _utf8_ for shift->SUPER::fetchcol_arrayref(@_)  };
sub fetchrow_array    {                 @{shift->       fetchrow_arrayref(@_)} };

With that code, I needed to replace calls to DBI module with calls to UTF8DBI module as shown in the patches below.

--- lib/MT/ObjectDriver/	2006-09-06 19:27:17.000000000 -0700
+++ lib/MT/ObjectDriver/	2006-09-06 19:23:09.000000000 -0700
@@ -7,7 +7,7 @@
package MT::ObjectDriver::DBI;
use strict;
-use DBI;
+use UTF8DBI;
use MT::Util qw( offset_time_list );
use MT::ObjectDriver;
--- lib/MT/ObjectDriver/DBI/	2006-09-06 19:26:55.000000000 -0700
+++ lib/MT/ObjectDriver/DBI/	2006-09-06 19:24:20.000000000 -0700
@@ -93,10 +93,10 @@
$dsn .= ';hostname=' . $cfg->DBHost if $cfg->DBHost;
$dsn .= ';mysql_socket=' . $cfg->DBSocket if $cfg->DBSocket;
$dsn .= ';port=' . $cfg->DBPort if $cfg->DBPort;
-    $driver->{dbh} = DBI->connect($dsn, $cfg->DBUser, $cfg->DBPassword,
+    $driver->{dbh} = UTF8DBI->connect($dsn, $cfg->DBUser, $cfg->DBPassword,
{ RaiseError => 0, PrintError => 0 })
or return $driver->error(MT->translate("Connection error: [_1]",
-             $DBI::errstr));
+             $UTF8DBI::errstr));

However, that didn’t fix all the problems. The Perl CGI module was still working in Latin1 mode. I could wrap that into a UTF8CGI module but the newer versions of CGI module support Unicode. So I just upgraded the version of CGI bundled with Movable Type. Still I needed to tell the CGI module that the character set in use was UTF-8. I could either do that every single time the CGI module was called or I could just set the default character set to UTF-8. Since this CGI module was in the Movable Type extlib folder, I decided to modify its default character set.

--- extlib/	2006-09-15 10:39:30.000000000 -0700
+++ extlib/	2006-09-15 10:39:59.000000000 -0700
@@ -517,8 +517,8 @@
$fh = to_filehandle($initializer) if $initializer;
-    # set charset to the safe ISO-8859-1
-    $self->charset('ISO-8859-1');
+    # set charset to utf-8
+    $self->charset('utf-8');

I also set the utf8 mode for writing the files to disk.

--- lib/MT/FileMgr/	2006-09-27 06:56:39.000000000 -0700
+++ lib/MT/FileMgr/	2006-09-27 06:57:36.000000000 -0700
@@ -75,6 +75,9 @@
binmode($from) if $fmgr->is_handle($from);
+    else {
+        binmode(FH, ":utf8");
+    }
## Lock file unless NoLocking specified.
flock FH, LOCK_EX unless $fmgr->{cfg}->NoLocking;
seek FH, 0, 0;

These changes caused problems with file uploads through the Movable Type interface. I expected this since I have run into this problem with PHP and mbstring as well. The following patch fixed this issue.

--- lib/MT/App/	2006-10-08 21:17:11.000000000 -0700
+++ lib/MT/App/	2006-10-08 21:17:37.000000000 -0700
@@ -8334,6 +8334,7 @@
$app->validate_magic() or return;
my $q = $app->param;
+    $q->charset('iso-8859-1');
my($fh, $no_upload);
if ($ENV{MOD_PERL}) {
my $up = $q->upload('file');

Then it was time to comment out the liberally sprinkled code to switch off the utf8 flag in Movable Type.

--- lib/MT/I18N/	2006-09-16 20:22:22.000000000 -0700
+++ lib/MT/I18N/	2006-09-16 20:23:26.000000000 -0700
@@ -292,7 +292,7 @@
$text = $class->_conv_to_utf8($text, $enc) if $enc ne 'utf-8';
$text = substr($text, $startpos, $length);
-    Encode::_utf8_off($text);
+#    Encode::_utf8_off($text);
$text = $class->_conv_from_utf8($text, $enc) if $enc ne 'utf-8';
@@ -322,7 +322,7 @@
-    Encode::_utf8_off($text) if $to eq 'utf-8';
+#    Encode::_utf8_off($text) if $to eq 'utf-8';

Finally I had to make changes to the MTHash plugin that I use to force comment previews. The Digest::SHA1 module only accepts bytes, therefore, the UTF-8 characters had to be encoded as bytes before being passed to any functions in the module. Here is my patch:

--- lib/MT/App/	2006-09-16 21:01:21.000000000 -0700
+++ lib/MT/App/	2006-09-16 21:03:08.000000000 -0700
@@ -266,9 +266,10 @@
require Digest::SHA1;
my $sha1 = Digest::SHA1->new;
-     $sha1->add($q->param('text') . $q->param('entry_id') . $app->remote_ip
-                . $q->param('author') . $q->param('email') . $q->param('url')
-                . $q->param('convert_breaks'));
+     my $octets = Encode::encode_utf8($q->param('text') . $q->param('entry_id') . $app->remote_ip
+                                      . $q->param('author') . $q->param('email') . $q->param('url')
+                                      . $q->param('convert_breaks'));
+     $sha1->add($octets);
my $salt_file = MT::ConfigMgr->instance->PluginPath .'/salt.txt';
my $FH;
open($FH, $salt_file) or die "cannot open file <$salt_file> ($!)";
--- plugins/	2006-09-16 20:29:22.000000000 -0700
+++ plugins/	2006-09-16 20:57:22.000000000 -0700
@@ -32,7 +32,8 @@
or return $ctx->error($ctx->errstr);
my $sha1 = Digest::SHA1->new;
-  $sha1->add($content);
+  my $octets = Encode::encode_utf8($content);
+  $sha1->add($octets);
my $salt_file = MT::ConfigMgr->instance->PluginPath .'/salt.txt';
open(FH, $salt_file) or die "cannot open file <$salt_file> ($!)";

One thing that I still need to do is to fix the Serializer and Un-serializer used by Movable Type plugins.

Happy Thanksgiving

Happy Turkey Day.

Happy Thanksgiving, everyone!

We are off to a friend’s to enjoy turkey (and more). I made dessert, lemon tart and Couer a la Creme. I might post the recipes later.

The Syrian Bride

This is a good movie about a marriage across the ceasefire line between Israeli-controlled Golan Heights and Syria. Family estrangement, conservatism, clash between Israeli and Syrian bureaucratic requirements are all depicted well. I rate the movie 8/10.

The Syrian Bride is about the travails of an Druze family in the Golan Heights. Of course, they are in a way stateless. So when a girl in the family is to be married to a Syrian guy crossing the border becomes an issue and red tape rears its head.

There are a number of other threads and subplots about other members of the family. There is the estranged older brother of the bride who was excommunicated from the clan because he married a Russian. There are the tensions between the aspirations of the older sister and her husband’s fear of societal reaction.

It is a good movie, worth watching. I would rate it 8/10.

Jonathan Strange & Mr Norrell

It is a book about magic and magical it is. Quite long but fun to read, Susanna Clarke has come up with a winner for her first novel, Jonathan Strange and Mr Norrell.

Jonathan Strange & Mr Norrell is a fantasy novel. It is about the quest of two English magicians, Strange and Norrell, to bring magic back to England.

It is a thick novel, about a thousand pages in the mass market paperback edition, but it is very enjoyable. I finished it in a few days as fast as I could.

This being Susanna Clarke’s first book, I look forward to reading more from her.

I recommend it highly.

UPDATE: Razib asks for more. So here is a whole seminar on the book.

والدین کی شادی کی سالگرہ

آج میرے ابو اور امی کی شادی کو 39 سال ہو گئے ہیں۔ شادی کی شالگرہ مبارک ہو امی اور ابو۔

آج میری زندگی کے لئے ایک اہم دن ہے کہ یہی میری پیدائش کا سبب ہے۔ آج میرے والدین کی شادی کی سالگرہ ہے۔ ان کی شادی کو 39 سال ہو گئے ہیں۔ بہت مبارک ہو ابو اور امی!

بچپن میں بہت عرصے تک ہم یہ سمجھتے رہے کہ ہمارے والدین کی شادی 9 نومبر کو ہوئی تھی کہ ہمیں ابو اور امی نے یہی بتایا تھا۔ وہ تو کچھ بڑے ہوئے تو میں نے ان کا نکاح‌نامہ ڈھونڈا۔ اس سے پتہ چلا کہ اصل تاریخ 10 نومبر تھی اور ہمارے والدین کو یاد ہی نہ تھی۔

Awards and Carnivals

Go nominate blogs for Brass Cresecent awards. Also, contribute to Carnival of Islam in the West. Include the real-time Carnival of Brass in your blog sidebar and read Muslims in the West blogs together in your feed reader.

Since the end of the year is near, it is again time for the Brass Crescent Awards for the Islamicate blogosphere.

Brass Crescent Awards Nominations

Right now they are taking nominations for a bunch of categories until November 17. So go and nominate your favorite weblogs. Do note that a blogger does not need to be a Muslim to be nominated (and definitely not necessary for those nominating or voting).

In defining the Islamsphere, we are not relying solely on adherence to the faith, but an affinity for parts of the diverse cultural fabric that Islam embraces and is embraced by worldwide.

Abu Sahajj, the owner of the Wa Salaam blog, started the Carnival of Islam in the West a few months ago. You can read the first and second editions of the carnival. Now it is time for the 3rd edition being hosted by Travellers on the Path of Knowledge. Do submit if you have any blog posts realting to the experience of Muslims in the West in these areas:

  • Religion and Worship
  • Marriage and Family
  • Education and Life
  • Business and Careers
  • News and Politics
  • The State of the Ummah

While on the subject of carnivals, Aziz has created a sort of “real-time” carnival called the Carnival of Brass where you can submit blog posts and media stories and there is an output feed which you can include on your blog which includes the items posted to the carnival. It updates at about one new item per day. I have included both the media and blog ones on my sidebar. If you want to include Carnival of Brass in your blog sidebar, please read Aziz’s FAQ.

Abu Sahajj also has an Islam in the West feed which aggregates quite a few Muslim blogs in the West. Just subscribe to the feed using your favorite feed aggregator and you can read a lot of Muslim blogs.

UPDATE: This month’s edition of the Carnival of Islam in the West is now up.

Vote Early, Vote Often

Today are the midterm elections. Do vote and vote against the Republicans as they have become arrogant. Also it is good to throw out the governing party once every few elections.

Today is election day here. So do vote. I won’t be voting out of respect for the law, and my non-fellow citizens (as Harry put it), but you should.

I think it is time to vote against the Republicans this time. Vote against any Republican standing for any office (even dogcatcher). There are several reasons for this. You will probably have heard the more important ones, like the US torture policy, the Iraq war, incompetence in the war on terror as well as in economics, and so on. But there is another important reason: arrogance of power. The Republicans have drunk from the fountain of power and it has gone to their head. This happens from time to time in a democracy and whenever it does, there is only one thing to do: Throw them out. I am a strong believer in voting against incumbents every few election cycles just for the heck of it. It keeps the politicians on their toes and does not let one party stay in power for too long which are both good things in my opinion.

As for predictions, mine is that the Democrats will take over the House by gaining about 24 seats but will fail in the Senate. Since Lieberman and Sanders are most likely going to caucus with the Democrats, this will result in a 50—50 Senate with Cheney casting the tiebreaker vote.

I have been following the polls, predictions and other election news at the following sites:

I hope to congratulate Captain Arrrgh tonight for getting rid of both his Senator and his Representative (both Republicans).

Apocalypse Now

Apocalypse Now is now a part of the culture. It is a movie about a Colonel gone bad (and mad) in the Vietnam War. Francis Ford Coppola did some great work here. I rate it 8/10.

Apocalypse Now is so famous that it doesn’t really need a review from me.

Now there is a new version out on DVD called “The Complete Dossier” that includes both the original theatrical cut and the longer Redux version. I think parts of the Redux make the movie better but it also increases the length by quite a bit and make it a bit slow.

I had seen parts of Apocalypse Now now and then on TV and was thoroughly familiar with all the famous quotes. But I wanted to watch it completely and properly.

It took us a while to watch it as with a 2 year old one has to be careful. So we had to wait for her to sleep before we could watch it.

Overall, it is a haunting movie about the Vietnam War. I would rate it 8/10.