Joomla supports out of the box the UTF-8 character encoding, so someone building a multilingual website should not have any problems using UTF-8 character encoding in his site. Right?
Yes and no, the CORE Joomla is problem free as I write this, but some non-core add-ons, templates - and yes, your own, home-cooked code can produce garbled output. Let's see why, and how we can fix it!
There are two major culprits here, one is the MySQL version used. Even if it's official unsupported, there are lot of servers running earlier (4.*) versions of the database engine powering Joomla. Upgrade it, and be sure, that you have chosen proper encoding for your database tables. There is not too much else to say there.
The second culprit is that smart little monkey making the magic of building your site from database records ad files stored on the webserver: the PHP engine.
Unfortunately PHP makes the assumption that all strings contain characters that are stored in single bytes. And here is the catch: UTF-8 is a multi-byte character encoding that enables us to store Unicode in a relatively small amount of space. Being a multi-byte character encoding, individual UTF-8 characters are stored in memory using a variable number of bytes.A good example of when this is problematic is counting the number of characters in a string using the PHP strlen() function. If the string contains UTF-8 data and one or more of the characters are represented in memory using multiple bytes, the value that is returned will be larger than expected.
Fortunately you don't need to write lengthly code chunks to deal with these situations often occurring when you deal with multi-byte character encoded strings in Joomla - we have handy since arrival of Joomla 1.5 an entire library of functions to help us treat correctly these strings, the JString class.
The Joomla! JString class contains a bunch of static methods that are UTF-8 aware. There is an equivalent JString method for each PHP string function that does not behave as expected when using UTF-8 strings.
Here are some of these (you can find easily the full list with examples at http://docs.joomla.org/API15:JString)
PHP Function | JString Method | Return Type | Parameters | Description |
---|---|---|---|---|
strlen | JString::strlen | int | string $str | Determines the length of $str. |
trim | JString::trim | string | string $str, [string $charlist] |
Remove leading and trailing whitespace or characters defined in $charlist. |
ltrim | JString::ltrim | string | string $str, [string $charlist] |
Removes leading whitespace or characters defined in $charlist. |
rtrim | JString::rtrim | string | string $str, [string $charlist] |
Removes trailing whitespace or characters defined in $charlist. |
strpos | JString::strpos | int or false | string $haystack, string $needle, [int $offset = 0] |
Finds position of the first occurrence of $needle in $haystack. |
strrpos | JString::strrpos | int orfalse | string $haystack, string $needle, [int $offset = 0] |
Finds position of the last occurrence of $needle in $haystack (PHP 4 behaves slightly differently).
WARNING: JString does not support $offset. |
substr | JString::substr | string | string $string, int $start, [int $length] |
Gets a portion of$string based on the character position $start and maximum length $length. |
iconv | JString::transcode | string | string $source, string $from_encoding, string $to_encoding |
Converts $sourcefrom one character encoding to another.Depending on the encodings, this can result in data loss. |
As you can see, the syntax is pretty similar to the basic PHP syntax, with the "JString::" prefix, and generally the same parameters and return values. So, if you want to build a really multilanguage-capable component, module, plugin or template for Joomla all what you need is to use these JString-equivalents of the PHP's string manipulation functions, and the success is guaranteed!