Commenting Problems

Commenting using non-ASCII characters is not working right now. We hope to have a fix soon. Temporary fix is to revert a recent upgrade of the MTValidate plugin.

We are having some problems with comments right now. You cannot write any Urdu in the comments or even use single or double quote signs (smart or otherwise). Basically you cannot use any non-ASCII characters in comments.

This seems to be related to the recent upgrade of this weblog to Movable Type 3.34 and MTValidate 0.5. It is probably a result of this blog being the only Movable Type blog running native Unicode.

We are working on fixing this soon.

UPDATE: I have temporarily fixed it by reverting to MTValidate 0.4. So comment away!

Movable Type and Unicode

Running Movable Type natively in Unicode was not as difficult as I thought but it still required a number of patches to the code.

I have been trying to get Movable Type to run Unicode natively for a while. When Movable Type was upgraded to version 3.3, I saw my chance. This new version has a lot of the needed code for encoding and decoding etc. and made my job much easier than before.

If you remember my previous travails, DBD::mysql module lacked UTF8 support. Almost immediately after my changes, the develper release of DBD::mysql finally included a UTF8 patch. But that was too late for me. Plus I am going to wait for it to be included in a regular release since DBD::mysql is somewhat complicated.

What I did was to set the UTF-8 flag for everything coming out of the database using a wrapper around the DBI module. I used Pavel Kudinov’s code for that, which is given below.

# UTF8DBI.pm re-implementation by Pavel Kudinov http://search.cpan.org/~kudinov/
# originally from: http://dysphoria.net/code/perl-utf8/
package UTF8DBI    ; use base DBI    ;
package UTF8DBI::db; use base DBI::db;
package UTF8DBI::st; use base DBI::st;
sub _utf8_() {
use Encode;
if    (ref $_ eq 'ARRAY'){ &_utf8_() foreach        @$_  }
elsif (ref $_ eq 'HASH' ){ &_utf8_() foreach values %$_  }
else                     {         Encode::_utf8_on($_) };
$_;
};
sub fetch             { return _utf8_ for shift->SUPER::fetch            (@_)  };
sub fetchrow_arrayref { return _utf8_ for shift->SUPER::fetchrow_arrayref(@_)  };
sub fetchrow_hashref  { return _utf8_ for shift->SUPER::fetchrow_hashref (@_)  };
sub fetchall_arrayref { return _utf8_ for shift->SUPER::fetchall_arrayref(@_)  };
sub fetchall_hashref  { return _utf8_ for shift->SUPER::fetchall_hashref (@_)  };
sub fetchcol_arrayref { return _utf8_ for shift->SUPER::fetchcol_arrayref(@_)  };
sub fetchrow_array    {                 @{shift->       fetchrow_arrayref(@_)} };
1;

With that code, I needed to replace calls to DBI module with calls to UTF8DBI module as shown in the patches below.

--- lib/MT/ObjectDriver/DBI.pm.orig	2006-09-06 19:27:17.000000000 -0700
+++ lib/MT/ObjectDriver/DBI.pm	2006-09-06 19:23:09.000000000 -0700
@@ -7,7 +7,7 @@
package MT::ObjectDriver::DBI;
use strict;
-use DBI;
+use UTF8DBI;
use MT::Util qw( offset_time_list );
use MT::ObjectDriver;
--- lib/MT/ObjectDriver/DBI/mysql.pm.orig	2006-09-06 19:26:55.000000000 -0700
+++ lib/MT/ObjectDriver/DBI/mysql.pm	2006-09-06 19:24:20.000000000 -0700
@@ -93,10 +93,10 @@
$dsn .= ';hostname=' . $cfg->DBHost if $cfg->DBHost;
$dsn .= ';mysql_socket=' . $cfg->DBSocket if $cfg->DBSocket;
$dsn .= ';port=' . $cfg->DBPort if $cfg->DBPort;
-    $driver->{dbh} = DBI->connect($dsn, $cfg->DBUser, $cfg->DBPassword,
+    $driver->{dbh} = UTF8DBI->connect($dsn, $cfg->DBUser, $cfg->DBPassword,
{ RaiseError => 0, PrintError => 0 })
or return $driver->error(MT->translate("Connection error: [_1]",
-             $DBI::errstr));
+             $UTF8DBI::errstr));
$driver;
}

However, that didn’t fix all the problems. The Perl CGI module was still working in Latin1 mode. I could wrap that into a UTF8CGI module but the newer versions of CGI module support Unicode. So I just upgraded the version of CGI bundled with Movable Type. Still I needed to tell the CGI module that the character set in use was UTF-8. I could either do that every single time the CGI module was called or I could just set the default character set to UTF-8. Since this CGI module was in the Movable Type extlib folder, I decided to modify its default character set.

--- extlib/CGI.pm.orig	2006-09-15 10:39:30.000000000 -0700
+++ extlib/CGI.pm	2006-09-15 10:39:59.000000000 -0700
@@ -517,8 +517,8 @@
$fh = to_filehandle($initializer) if $initializer;
-    # set charset to the safe ISO-8859-1
-    $self->charset('ISO-8859-1');
+    # set charset to utf-8
+    $self->charset('utf-8');
METHOD: {

I also set the utf8 mode for writing the files to disk.

--- lib/MT/FileMgr/Local.pm.orig	2006-09-27 06:56:39.000000000 -0700
+++ lib/MT/FileMgr/Local.pm	2006-09-27 06:57:36.000000000 -0700
@@ -75,6 +75,9 @@
binmode(FH);
binmode($from) if $fmgr->is_handle($from);
}
+    else {
+        binmode(FH, ":utf8");
+    }
## Lock file unless NoLocking specified.
flock FH, LOCK_EX unless $fmgr->{cfg}->NoLocking;
seek FH, 0, 0;

These changes caused problems with file uploads through the Movable Type interface. I expected this since I have run into this problem with PHP and mbstring as well. The following patch fixed this issue.

--- lib/MT/App/CMS.pm.orig	2006-10-08 21:17:11.000000000 -0700
+++ lib/MT/App/CMS.pm	2006-10-08 21:17:37.000000000 -0700
@@ -8334,6 +8334,7 @@
$app->validate_magic() or return;
my $q = $app->param;
+    $q->charset('iso-8859-1');
my($fh, $no_upload);
if ($ENV{MOD_PERL}) {
my $up = $q->upload('file');

Then it was time to comment out the liberally sprinkled code to switch off the utf8 flag in Movable Type.

--- lib/MT/I18N/default.pm.orig	2006-09-16 20:22:22.000000000 -0700
+++ lib/MT/I18N/default.pm	2006-09-16 20:23:26.000000000 -0700
@@ -292,7 +292,7 @@
$text = $class->_conv_to_utf8($text, $enc) if $enc ne 'utf-8';
Encode::_utf8_on($text);
$text = substr($text, $startpos, $length);
-    Encode::_utf8_off($text);
+#    Encode::_utf8_off($text);
$text = $class->_conv_from_utf8($text, $enc) if $enc ne 'utf-8';
$text;
}
@@ -322,7 +322,7 @@
}
}
-    Encode::_utf8_off($text) if $to eq 'utf-8';
+#    Encode::_utf8_off($text) if $to eq 'utf-8';
$text;
}

Finally I had to make changes to the MTHash plugin that I use to force comment previews. The Digest::SHA1 module only accepts bytes, therefore, the UTF-8 characters had to be encoded as bytes before being passed to any functions in the module. Here is my patch:

--- lib/MT/App/Comments.pm.orig	2006-09-16 21:01:21.000000000 -0700
+++ lib/MT/App/Comments.pm	2006-09-16 21:03:08.000000000 -0700
@@ -266,9 +266,10 @@
require Digest::SHA1;
my $sha1 = Digest::SHA1->new;
-     $sha1->add($q->param('text') . $q->param('entry_id') . $app->remote_ip
-                . $q->param('author') . $q->param('email') . $q->param('url')
-                . $q->param('convert_breaks'));
+     my $octets = Encode::encode_utf8($q->param('text') . $q->param('entry_id') . $app->remote_ip
+                                      . $q->param('author') . $q->param('email') . $q->param('url')
+                                      . $q->param('convert_breaks'));
+     $sha1->add($octets);
my $salt_file = MT::ConfigMgr->instance->PluginPath .'/salt.txt';
my $FH;
open($FH, $salt_file) or die "cannot open file <$salt_file> ($!)";
--- plugins/MTHash.pl.orig	2006-09-16 20:29:22.000000000 -0700
+++ plugins/MTHash.pl	2006-09-16 20:57:22.000000000 -0700
@@ -32,7 +32,8 @@
or return $ctx->error($ctx->errstr);
my $sha1 = Digest::SHA1->new;
-  $sha1->add($content);
+  my $octets = Encode::encode_utf8($content);
+  $sha1->add($octets);
my $salt_file = MT::ConfigMgr->instance->PluginPath .'/salt.txt';
open(FH, $salt_file) or die "cannot open file <$salt_file> ($!)";
$sha1->addfile(FH);

One thing that I still need to do is to fix the Serializer and Un-serializer used by Movable Type plugins.