Oh no! Where's the JavaScript?
Your Web browser does not have JavaScript enabled or does not support JavaScript. Please enable JavaScript on your Web browser to properly view this Web site, or upgrade to a Web browser that does support JavaScript.
Sign In
Not a member yet? Click here to register.

Windows-1251 or utf-8 ? for cyrillic ...

Hi. Which encoding I have to use for cyrillic (bulgarian) language?

Thanks.
russian language - windows-1251
Thanks :)
In principle you can use utf- 8
I know ... now I'm using utf-8 ... but for example .. english locale is in ANSI encoding ... and when I trun on english locale ... all my menus and text that are in bulgaria (utf-8 encoding) become rhombus with ? inside. And also I have some other problems with utf-8 ... that I don't have with windows-1251 encoding.

I will return my locale files in windows-1251 encoding ... I wanted to be sure ... that's why asking.

Thanks again.
Man ! This is intentional !

The Bulgarian/Cyrillic Characters comes in the UTF-8 set. UTF-8 is almost the International which contains almost all Languages.
But ANSI Set doesn't contain characters other than English and some special characters.

ANSI Codes Chart | UTF-8 Code Chart

For Example : (Understand the condition)

Case 1 : Your English Locale has the character set as the ANSI.

Now, ANSI only contains English and some special characters. So, it will show you the words properly which are in English. Please refer to the Chart Links. It will not show those characters which are not in the Set. So it will show it as Broken(rhombus with ?)

Case 2 : Your Cyrillic Locale has the character set as the UTF-8.

Now, you will be able to see Cyrillic Characters as well as the English characters as these all characters comes in UTF-8.

So, in order to see the Cyrillic characters(which are posted by you manually in your Website) in English, you have to set the character set of English Locale to UTF-8.

Its Like :

English Characters in ANSI - Yes
English Characters in UTF-8 - Yes
Cyrillic Characters in UTF-8 - Yes
Cyrillic Characters in ANSI - No
Okay, but when my locales was first in ANSI encoding, I had no problems with it? Now ... when they are in UTF-8 I have rhombus with ? in some parts. See at the index page - http://www.phpfusion-bg.com .
Man ! This is not the problem with the Locales anywhere !

I think you should have given the website Link before to find the solution early pfft

This is the problem with the Character replacement by the TRIMLINK function used. The character which is getting shown as rhombus with ? is a Cyrillic character.

See I am posting that character here and what it becomes : о

The problem is that Trimlink function replaces some special characters in order to resolve HTML Validation errors.
See :
[syntaxhighlighter brush=php,first-line=1,highlight=0,collapse=false,html-script=false]function trimlink($text, $length) {
$dec = array("&", """, "'", "", '"', "'", "<", ">"wink;
$enc = array("&amp;", "&quot;", "'", "", "&quot;", "'", "&lt;", "&gt;"wink;
$text = str_replace($enc, $dec, $text)[/syntaxhighlighter]

While trimming, I think that the Compiler uses default character set which don't have cyrillic characters(i think because I don't have technical knowledge but I am 80% sure)... So it trims and shows the non-english character as broken.

And that is why it doesn't show it as broken when you click that link and see the Thread title.

This can be resolved by editing trimlink function.

But, the problem with which I am afraid of is that you may have to change the ENCODING of the file maincore.php to utf-8 which I haven't tested.

So, there is no problem with locales for that.
I've change the encoding of maincore.php file to utf-8 ... but nothing happens. The only solution that I've found for this problem .. is to increase the number of characters in thread subject in forum_threads_list_panel.php to bigger number

CodeDownload  
".trimlink($data['thread_subject'], 30)."



In that case ... I make it 40 .. so full thread subject can be written.
Thats what I am talking about that I haven't tested changing encoding of maincore.php !

So, if you have done, then you can do it if you don't want to increase number of characters.

[syntaxhighlighter brush=php,first-line=1,highlight=0,collapse=false,html-script=false]function trimlink($text, $length) {
$dec = array("&", """, "'", "", '"', "'", "<", ">"wink;
$enc = array("&amp;", "&quot;", "'", "", "&quot;", "'", "&lt;", "&gt;"wink;
$text = str_replace($enc, $dec, $text)[/syntaxhighlighter]

Add that character in $dec array.
Add the hex sysmbol of the character in $enc array, i.e, &#1086;

Done !
This is how it looks in my maincore.php

[syntaxhighlighter brush=php,first-line=1,highlight=0,collapse=false,html-script=false]
// Trim a line of text to a preferred length
function trimlink($text, $length) {
$dec = array("&", """, "'", "\", '"', "'", "<", ">"wink;
$enc = array("&amp;", "&quot;", "'", "\", "&quot;", "'", "&lt;", "&gt;", "&#1086;"wink;
$text = str_replace($enc, $dec, $text);
if (strlen($text) > $length) $text = substr($text, 0, ($length-3))."...";
$text = str_replace($dec, $enc, $text);
return $text;
}[/syntaxhighlighter]

I add &#1086; in $end, but I don't understand which character exactly to add in $dec
The character which is getting converted into rhombus with ? on your site !

Just copy and paste that char !
I try it .. but nothing happens. I thing you don't understand me. The character which is getting converted into rhombus with ? is not the problem. The same character is using in the upper thread and there he is normal. I thing this problem is because the thread subject is seto to be too short. I thing the problem is somewhere in the $lenght, cause when I set thread subject to be longer ... the problem dissapears.

Btw, in ANSI this problem is gone.
I've tested the same thing for my Locale(Hindi) which comes in UTF-8.

The situation was same and the character was minus sign(-). It was converting the minus sign(-) to the rhombus with ?.

So then I did this same thing and its working for me.

Could you check by entering a minus sign in Thread title with UTF-8 Cyrillic Locale ...?
Now I've tested your suggestion .. and there are no problems with minus (-) sign in UTF-8 locales


btw and wich collation I have to use? By default is set lati1_swedish_ci?
Thread Information
Author
Posted In
Replies
15 posts
Views
6,526 times
Last Post
Last updated on 9 years ago
You can view all discussion threads in this forum.
You cannot set up a bounty in this discussion thread.
You can start a new discussion thread in this forum.
You cannot reply in this discussion thread.
You cannot start on a poll in this forum.
You cannot upload attachments in this forum.
You can download attachments in this forum.
You cannot up or down-vote on the post in this discussion thread.
Users who participated in discussion: rUMENcho, jikaka, Ankur