ا آ ب پ ت ٹ ث ج چ ح خ د ڈ ذ ر ڑ ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن ں و ہ ھ ء ی ے
Problems Viewing Urdu Text Above
I am using the Tahoma font for writing Urdu. Unicode is the standard character set for such things nowadays and the Arabic script part of it contains characters for Arabic, Persian, Urdu, etc. Some other fonts provide the Arabic characters which are a subset of Urdu ones, but not all Urdu characters. Tahoma has support for Urdu characters.
I have checked both Windows 2000 and XP and the Tahoma versions (2.80 and 3.00 respectively) installed with these two operating systems support Urdu properly. Windows 98 by default does not. However, installing a newer version of the font might help you viewing Urdu pages.
Older web browsers won’t display Urdu correctly. You should use a recent browser, like Internet Explorer 5.5 or later, Mozilla 1.5 or later, Netscape Navigator 6.0 or later, Opera 6 or later. Alan Wood has a detailed list and description of different browsers’ support for Unicode. He also has specific information about Arabic support.
Mac OS 9.2 and Mac OS X also support the Arabic script. I am not sure about Unix/Linux.
This website has a list of operating systems, browsers and fonts that support the Arabic script.
If you have more information about viewing or creating Urdu web pages on Macintosh or Unix, please let me know. Also, let me know if you would like me to include other Mac- or Unix-specific fonts in my style class for Urdu and how my weblog’s Urdu posts look in other browsers and operating systems.
I have set up my weblog so that if your computer does not have a Tahoma font, then it is provided from my website automatically. This should work on both Windows machines and Macs. I am not sure about Unix. However, if you have a really old version of Tahoma which does not contain all Urdu characters, then my newer version of the font is unfortunately not downloaded (I am actually not sure about this.) I am using Microsoft WEFT 3 to embed the fonts.
If the font is downloaded, it is installed only temporarily to view my weblog. This is due to the licensing provided with the Tahoma font. I can only do “editable embedding” which means temporary installation. “Installable embedding” which would allow a font to be permanently installed is not allowed.
Umair has a font for download as well as instructions on how to install it at the bottom of his Urdu weblog. That should help all Windows users.
UPDATE: Asif has an installable package of Urdu fonts for Windows.
How to Type in Urdu
I’ll only describe the Windows options here since they are the only ones I am familiar with. For other systems, please take a look at the links at the end of the post.
You have two options: You can either install full Urdu/Arabic support or use an Urdu Unicode editor.
Urdu support based on Unicode is available only in Windows 2000 and XP. The Microsoft website has the instructions on how to install Urdu language support and keyboard. Once you have installed it, you can simply switch language to Urdu from the taskbar and start typing in Urdu.
If you are like me, you might not like the Urdu keyboard layout. Shehzad has designed a phonetic Urdu keyboard. This maps Urdu characters on your keyboard such that phonetically similar Urdu and English characters are mapped to the same key. This is good for us who are used to typing in English. I recommend that you install that as well. The installation instructions are on his page.
UPDATE: I like this phonetic Urdu keyboard layout better.
Another thing you might need to get used to an Urdu keyboard is the on-screen keyboard. This is available from the accessibility options of Windows. You can either click on the keys of the on-screen keyboard or just use it as a guide while you type.
If you are not using Windows 2000 or XP or you don’t want to install the whole Arabic/Urdu support stuff, you can download some Unicode editor. A good one is Unipad. The free version allows you to type upto 1000 characters. For longer text, you have to buy it. It comes with a built-in Urdu keyboard and font. You can display the Unipad keyboard on-screen as well. Unipad does not require that you have the Urdu/Arabic language support installed on your machine. Once you have typed your text in Urdu, you can copy and paste it into other applications. If you are planning to put the text online and are afraid if you might not have support for Urdu setup properly, you can select the Urdu text in Unipad and convert it to XML/HTML entities. All the Urdu characters will change to
&1651; or some other number. This is useful sometimes, though the problem is with editing. Unipad comes through, however, since you can convert the entities back to the characters as well.
How to Type Urdu Comments
Now, let’s talk about how you can type Urdu comments on this weblog. You can follow one of the two methods from the last section.
There is, however, the little matter of correct alignment of the text since Urdu is written from right to left while English is written left to right. To get that correct, you should do the following:
- At the start of every Urdu paragraph, type in English: @p[ur](urdu). @ followed by a space.
- If you have an English word within the Urdu paragraph, surround it with some code like this:
- If you have an Urdu word in an English paragraph, it needs something similar:
Let’s now talk about what stuff I had to do with my blogging software to get the Urdu blogging going properly. This is obviously Movable Type specific, but the general principles apply to other blogging tools as well. I might later add stuff about Blogger. If you are blogging in Urdu, please write up something about how to setup your blogging tool for that and let me know.
First of all, you need to make a few changes in
mt.cfg. Find the line about
PublishCharset. Uncomment it (by removing the # at the start of the line) and change it to
PublishCharset utf-8. This will change the character encoding of your weblog from ISO-8859-1 (Latin 1) to the Unicode one. Also, uncomment the
NoHTMLEntites line and set it to
NoHTMLEntites 1. This will leave your Urdu Unicode character as is instead of changing them over to HTML entities like
There is one more change that I did in
mt.cfg. This one was for comments. Since I am allowing comments in Urdu, I have to allow some extra HTML tags than the default to take care of the alignment. I’ll explain the tags later, but here’s my sanitize line in
GlobalSanitizeSpec a href,b,br/,p,strong,em,ul,li,blockquote,p class lang,span class lang,i
If your web server is running Apache, you should also add the following in the
.htaccess file in the top weblog directory (if the file doesn’t exist, create a new one):
AddType 'text/html; charset=UTF-8' html
This tells the server that all your files with .html extension should be served as of type text/html and with the character set of UTF-8.
The next problem was that the Movable Type interface was using fonts that did not display all Urdu characters properly. So I was seeing a lot of squares while typing my Urdu entries. To fix that, we need to make changes to the Movable Type interface style file. This is
styles.css in your Movable Type directory. I am providing my styles.css file with the necessary changes.
The changes basically are to add “tahoma” as the first font in “font-family” for the following classes/styles:
The last thing that needed to be done was to define the alignment for Urdu text in a style class for the weblog. Here is my CSS file. I am using the following two classes:
font-family: tahoma, "Arial Unicode MS", arial, georgia, verdana, sans-serif;
font-family:georgia, verdana, arial, sans-serif;
Whenever I have an Urdu paragraph, I use
<p class="urdu" lang="ur">...</p> around it. I don’t use anything around the English paragraphs.
When I have a few words of Urdu in an English paragraph, I use
<span class="urdu" lang="ur">...</span> around the Urdu words. Similarly, when I have some English words in an Urdu paragragh, I put
<p class="en" lang="en-US">...</p> around them.
Actually, since I use the MT-Textile plugin, I use the simpler Textile codes as I showed in the section about commenting.
- Alan Wood’s Unicode Resources
- Browsers and Fonts that work for Arabic
- Shehzad’s website about Urdu websites creation
- Mac OS 9 Language Pack Installation
- Alan Wood’s List of Arabic Unicode Fonts
- Urdu Support for Windows
- Justifying Arabic Text
- HTML lang and dir attributes
- A better way for language-based styles
- Authoring HTML for Middle Eastern Content
- U-Trans: A program for converting ArabTex transliteration code into Unicode plus an Urdu Nastalique font
- Unipad: A Unicode text editor
- Urdu Phonetic Keyboard Layout for Unipad
- Syed Rizwan Rizvi’s Urdu-related page
- Hugo’s Urdu Page
- Urdu Alphabet
- Yahoo! Group for Urdu Computing
UPDATE: More here and here.