Webmastersite.net
Register Log In

Charset Issues
How to use UTF-8 as default

Comments on Charset Issues

EU-Translations


Usergroup: Customer
Joined: Sep 08, 2005
Location: Germany

Total Topics: 2
Total Comments: 10
Posted Sep 11, 2005 - 8:18 AM:

Hi all, I am just starting to use the software, with the multilingual install for having english and german versions. First it might be that I haven't found the correct Deutsch.lng but mine is to a small part in german and to a larger part still in English - tons of translation work.

But now to my question: I want to use utf-8 encoding as a rule for all languages. Where do I set the variable LANG_CHARSET for each language?
I have for now replaced that variable in wrapper.tpl with charset=utf-8, but that doesn't work. The created pages still do not show correct german Umlaute for example. That's because the system does not store information in utf-8 I suppose?

Any ideas appreciated!
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
Posted Sep 11, 2005 - 9:18 AM:

The character set is specified by the charset language item, which simply inserts itself in the template where you've now hard-coded the value. Google says the correct tag for utf-8 is <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Since you did a hard-coded change instead of using the language variable, that means your admin wrapper didn't get altered. You'll need the same meta tag in your admin panel wrapper so that language items you type are interpreted right.
EU-Translations


Usergroup: Customer
Joined: Sep 08, 2005
Location: Germany

Total Topics: 2
Total Comments: 10
Posted Sep 11, 2005 - 9:49 AM:

Thanks for the fast reply, Paul.
So I have hardcoded the utf-8 declaration into the admin wrapper too, but that doesn't do the trick.
The problem is on another level I suppose. Setting charset declarations in webpages tells browsers what charset is used on that page. If however you tell the browser to use utf-8 while the page simply isn't written/stored in utf-8 then you get incorrect display.

So the question is probably in what format does the software store text input?

Second thing: I have now changed back to the original wrapper templates. Now when viewing the source code of created pages (choosing german as user language) I get this result:
<meta http-equiv="Content-Type" content="text/html; charset={LANG_CHARSET}">
So that's why I asked in my first post: Where do I set the value of {LANG_CHARSET}?

Thanks!

Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
Posted Sep 11, 2005 - 10:34 AM:

It stores in whatever format you write it in. People have done Chinese translations, so given that Chinese character sets work I don't see anything different about how German would be stored.

It's possibly that changed in MySQL 4.1 or so mean you have to specify the character set in the mysql field too, but I haven't seen any confirmation of that, and that would only apply to submitted items, not templates.

Where do I set the value of {LANG_CHARSET}?

Click 'language' in your admin panel. Search for the language item named charset. But since you've erased it from your templates it will no longer do anything.
EU-Translations


Usergroup: Customer
Joined: Sep 08, 2005
Location: Germany

Total Topics: 2
Total Comments: 10
Posted Sep 11, 2005 - 12:30 PM:

>But since you've erased it from your templates it will no longer do anything.

I have switched back to using the original wrapper tpls.

>>Where do I set the value of {LANG_CHARSET}?

>Click 'language' in your admin panel. Search for the language item named charset

I've tried that. For English I find the item alright, the value is utf-8, preset, I have not changed it.
For Deutsch no item is found.
What I have done now is to create a new language item for Deutsch named LANG_CHARSET and set the value to utf-8. That solves the problem.

For French, for example, the same problem seems to apply: no charset is defined per default, in pages created after switching to French the header shows:
<meta http-equiv="Content-Type" content="text/html; charset={LANG_CHARSET}"> and a search for the language item in french doesn yield any results.

Please note that this problem only occurs with submitted items, and only when submitting items in French or German and viewing the pages with another language set. It does not pertain to the itmes stored in the lng files.

The utf-8 declaration is important for me because I will have submitters with both English and German set as language preference.
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
Posted Sep 11, 2005 - 10:27 PM:

Please note that this problem only occurs with submitted items

So not when submitting language variable values, only submitting links (, categories, etc)? Then what's your MySQL version? They introduced character set declarations for text fields fairly recently, and that may have broken their flexibility such that as I said in my previous post you could have to define the char sets for each field.

For Deutsch no item is found.

It may be an outdated language file such that you need to run upgrade.php to get new items from fullenglish added to it.
EU-Translations


Usergroup: Customer
Joined: Sep 08, 2005
Location: Germany

Total Topics: 2
Total Comments: 10
Posted Sep 12, 2005 - 2:41 AM:

>So not when submitting language variable values, only submitting links (, categories, etc)?

Ok, correction: it also happened with language variable values which I had updated through the admin language panel. Those were items from Deutsch, and I updated them while working with Deutsch set as my user language.

> what's your MySQL version?

MySQL Version: 3.23.57-log

> It may be an outdated language file such that you need to run upgrade.php to get new items from fullenglish added to it.
I don't really understand this. I have just 2 days ago downloaded all files from here. Neither Deutsch nor French have this item, haven't checked the other languages.

Perhaps I have made some mistake while installing the multilingual version?
For testing this I have just made another install, also multilingula, choosing english and deutsch as languages to be installed. But as with the first install process only english was installed. The deutsch.lng i have to upload manually to the directory: /languages/deutsch.lng. Importing does not work, I get a message saying I have to chmod 777 first, which I have done, also tried 755 as per manual.
So is there anything wrong in this process?

Olney
Member

Usergroup: Customer
Joined: Oct 30, 2004

Total Topics: 18
Total Comments: 47
Olney
Posted Sep 12, 2005 - 4:53 PM:

I'll help out with this one.
All of our sites use utf-8

You have to change it in the Admin Wrapper
& in the general wrapper.

We just handcoded it each time in the two wrapper files.
We use the script in Japanese & English.
Depending on the language you use only outgoing emails will be a problem.

Unicode is not a common email format.

If you entered any text beforehand from the admin it wouldn't convert automatically.

Hope this helps...
EU-Translations


Usergroup: Customer
Joined: Sep 08, 2005
Location: Germany

Total Topics: 2
Total Comments: 10
Posted Sep 12, 2005 - 5:11 PM:

Thanks, Olney!
So to sum this up: setting the charset for languages other than english can be done in 2 ways:

- hardcode the desired charset declaration in the admin wrapper.tpl and the general wrapper.tpl by replacing '{LANG_CHARSET}' for each language

- create a new language variable named LANG_CHARSET for each language and assign the desired charset declaration as value

Perhaps a dedicated field in the admin language management panel would be good?


Olney, how do you work around the problem with outgoing mails?
Olney
Member

Usergroup: Customer
Joined: Oct 30, 2004

Total Topics: 18
Total Comments: 47
Olney
#10 - Quote - Permalink
Posted Sep 13, 2005 - 5:18 AM:

Unfortunatelly we can't give confirmations in Japanese from the system.

We would have to confirm them from our email client.

Japanese email is 7bit encoding which is not a standard web format it's only an email format.

There's a functional converter using the mbstring on php but none of our sites are big enough for us to hire someone to implement it.

So for languages that are double byte like Chinese, Japanese, Korean, Arabic, & Russian it will email in Unicode.
This will work for some email clients like Outlook but this won't work for online web based email services like Hotmail or Yahoo.
Not thinking of this is not the fault of any English speaking Programmer or company. This isn't really documented in English.

Also if you use (CharSet) make sure you think about input from the Admin. I would really suggest just using Unicode. I didn't like using Unicode at first & just wanted to stick with shift-jis in Japanese but if you use utf-8 your site appears more in search engine results.
EU-Translations


Usergroup: Customer
Joined: Sep 08, 2005
Location: Germany

Total Topics: 2
Total Comments: 10
#11 - Quote - Permalink
Posted Sep 13, 2005 - 10:31 AM:

>I didn't like using Unicode at first & just wanted to stick with shift-jis in Japanese but if you use utf-8 your site appears more in search engine results

I love utf-8. Whenever possible I do everything in utf-8, it's as close a you can get on truly multilingual web programming. It makes working with several languages so much easier, from Kyrillic alphabets to Chinese and Japanese.



Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
#12 - Quote - Permalink
Posted Sep 15, 2005 - 7:16 PM:

Would there be some sort of drawback for English sites if it were to default to utf-8 encoding? (If not, it seems it would save me a few support requests.)
EU-Translations


Usergroup: Customer
Joined: Sep 08, 2005
Location: Germany

Total Topics: 2
Total Comments: 10
#13 - Quote - Permalink
Posted Sep 16, 2005 - 11:56 AM:

I don't see any drawbacks. If you do a multilingual install and work with english you use the Fullenglish.lng. That .lng has utf-8 preset and it works flawlessly.

You might keep in mind though what Olney has to say about mailing issues.
I have just tested sending mails from the admin area where utf-8 is set. Viewing the mail at a hotmail like freemailer worked okay.
Olney
Member

Usergroup: Customer
Joined: Oct 30, 2004

Total Topics: 18
Total Comments: 47
Olney
#14 - Quote - Permalink
Posted Sep 16, 2005 - 6:47 PM:

Actually ISO-English encode converts to UTF flawlessly

Even the mail

Mail is only an issue in double byte languages
Japanese, Korean, Russian. Chinese etc.

This is one of those kinks that make some scripts not compltely Japanese complatible but there are work arounds if the site is small.

I'd recommend using UTF-8 in the future.
Search thread for
Download thread as
  • 0/5
  • 1
  • 2
  • 3
  • 4
  • 5



This thread is closed, so you cannot post a reply.