Problems with mime encoding of Japanese Characters in Subject and 'From:' etc. fields.

View: New views
7 Messages — Rating Filter:   Alert me  

Problems with mime encoding of Japanese Characters in Subject and 'From:' etc. fields.

by diresu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I try to send messages written in Japanese (Kana/Kanji) with php.

Everything works fine - only when the subject (or the name of the
sender) becomes longer, there seems to be something wrong with the
encoding: Neither my nor the mail reader of other Japanese friends is
able to decode the mime string. At the place of the Japanese
Characters, the mime string itself is displayed.

As this doesn't happen for other Japanese emails with even long
subjects, I suppose I did something wrong...

When using the corresponding php mb_* functions to decode the string
back, sometimes the correct original string and sometimes meaningless
characters are shown.

Here how I convert the subject (the name is converted using the same
method and the sources are saved in UTF-8 using emacs):

  $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
  $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
  ...snip...
  mail($to, $subjectMIME, $bodyJIS, $headers);

Here part of the message as it is displayed by my mail program:

  From:
=?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B <d.bollmann@...>
  ...snip...
  Subject:
=?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?= =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?= (B
  ...snip...
 
  かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢
字かな漢字

And here part of the mail text itself:

  ...snip...
  Subject:

=?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?=

=?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=
  =?ISO-2022-JP?B?KEI=?=
  MIME-Version: 1.0
  From:
=?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?= =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?= =?ISO-2022-JP?B?KEI=?= <d.bollmann@...>
  ...snip...
  Content-Type: text/plain; charset=ISO-2022-JP
  ...snip...
 
  かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢
字かな漢字

Here a part of another (spam) mail which is correctly displayed by my
mail program:

  MIME-Version: 1.0
  Subject:
=?ISO-2022-JP?B?GyRCIXolXSUkJXMlSBsoQjEwGyRCR1whdUF3TkEbKEI=?=
  =?ISO-2022-JP?B?GyRCTDVOQSF6GyhCMSwwMDAbJEIxXyU4JWUbKEI=?=
  =?ISO-2022-JP?B?GyRCJSglaiE8Qmc9ODlnISohWjNaRTchWxsoQg==?=
  =?ISO-2022-JP?B?GyRCIUobKEIyMDA4LzAzLzE5?= =?ISO-2022-JP?B?KQ==?=
  From: =?ISO-2022-JP?B?GyRCM1pFNztUPmwlOCVlJSglaiE8ISYlIhsoQg==?=
=?ISO-2022-JP?B?GyRCJS8lOyU1JWohPCVLJWUhPCU5GyhC?=
<jewelry@...>
 
Displayed as:

  From: 楽天市場ジュエリー・アクセサリーニュース
<jewelry@...>
  ...snip...
  Subject: ★ポイント10倍&送料無料★1,000円ジュエリー大集合!【楽天】
(2008/03/19)
 
If anybody can explain me the problem I would be most gratefull :)

Thanks, Dietrich




--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: Problems with mime encoding of Japanese Characters in Subject and 'From:' etc. fields.

by diresu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

A reply to myself:

Here a little example program with the described the problem
and the code I am using...

Thanks, Dietrich


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
     <head>
     <meta http-equiv="content-type" content="text/html;
charset=UTF-8" />
     <title>Contact Me</title>
     </head>
     <body>

     <h1>かな漢字</h1>

     <?php # Script 10.1 - email.php

function sendEmail($recipientEmailAddress, $subject, $body, $senderName,
$senderEmailAddress) {
   
    // set current language to Japanese
    mb_language("ja");

    // encode subject
    // - first using JIS (ISO-2022-JP)
    // - after encoding the resulting JIS string with the MIME header
encoding scheme
    $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
    $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP",
"B");

    // encode the name of the sender
    // - first using JIS (ISO-2022-JP)
    // - after encoding the resulting JIS string with the MIME header
encoding scheme
    $senderNameJIS  = mb_convert_encoding($senderName, "ISO-2022-JP",
"AUTO");
    $senderNameMIME = mb_encode_mimeheader($senderNameJIS,
"ISO-2022-JP", "B");

    // encode body
    // - using JIS (ISO-2022-JP)
    // - the used coding system had to be specified in the
Content-Type/charset header:
    //   Content-Type: text/plain; charset=ISO-2022-JP
    $bodyJIS = mb_convert_encoding($body, "ISO-2022-JP", "AUTO");

    // formatting the sender string
    $senderMIME = sprintf("%s <%s>", $senderNameMIME,
$senderEmailAddress);

    // formatting the mime header
    $headers  = "MIME-Version: 1.0\n" ;
    $headers .= sprintf("From: %s\n",     $senderMIME);
    $headers .= sprintf("Reply-To: %s\n", $senderMIME);
    $headers .= "Content-Type: text/plain; charset=ISO-2022-JP\n";
   
    // send encoded mail
    $result = mail($recipientEmailAddress, $subjectMIME, $bodyJIS,
$headers);

    // return result
    return $result;
 }

$to                 = "snip@...";
$subject            = "かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字
かな漢字かな漢字かな漢字かな漢字";
$body               = "かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字
かな漢字かな漢字かな漢字かな漢字";
$senderName         = "かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字
かな漢字かな漢字かな漢字かな漢字";
$senderEmailAddress = "snip@...";
     
// send the email
sendEmail($to, $subject, $body, $senderName, $senderEmailAddress);
           
?>

</body>
</html>


On Thu, 2008-03-20 at 03:33 +0900, Dietrich Bollmann wrote:

> Hi,
>
> I try to send messages written in Japanese (Kana/Kanji) with php.
>
> Everything works fine - only when the subject (or the name of the
> sender) becomes longer, there seems to be something wrong with the
> encoding: Neither my nor the mail reader of other Japanese friends is
> able to decode the mime string. At the place of the Japanese
> Characters, the mime string itself is displayed.
>
> As this doesn't happen for other Japanese emails with even long
> subjects, I suppose I did something wrong...
>
> When using the corresponding php mb_* functions to decode the string
> back, sometimes the correct original string and sometimes meaningless
> characters are shown.
>
> Here how I convert the subject (the name is converted using the same
> method and the sources are saved in UTF-8 using emacs):
>
>   $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
>   $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
>   ...snip...
>   mail($to, $subjectMIME, $bodyJIS, $headers);
>
> Here part of the message as it is displayed by my mail program:
>
>   From:
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B <d.bollmann@...>
>   ...snip...
>   Subject:
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?= =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?= (B
>   ...snip...
>  
>   かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢
> 字かな漢字
>
> And here part of the mail text itself:
>
>   ...snip...
>   Subject:
>
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?=
>
> =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=
>   =?ISO-2022-JP?B?KEI=?=
>   MIME-Version: 1.0
>   From:
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?= =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?= =?ISO-2022-JP?B?KEI=?= <d.bollmann@...>
>   ...snip...
>   Content-Type: text/plain; charset=ISO-2022-JP
>   ...snip...
>  
>   かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢字かな漢
> 字かな漢字
>
> Here a part of another (spam) mail which is correctly displayed by my
> mail program:
>
>   MIME-Version: 1.0
>   Subject:
> =?ISO-2022-JP?B?GyRCIXolXSUkJXMlSBsoQjEwGyRCR1whdUF3TkEbKEI=?=
>   =?ISO-2022-JP?B?GyRCTDVOQSF6GyhCMSwwMDAbJEIxXyU4JWUbKEI=?=
>   =?ISO-2022-JP?B?GyRCJSglaiE8Qmc9ODlnISohWjNaRTchWxsoQg==?=
>   =?ISO-2022-JP?B?GyRCIUobKEIyMDA4LzAzLzE5?= =?ISO-2022-JP?B?KQ==?=
>   From: =?ISO-2022-JP?B?GyRCM1pFNztUPmwlOCVlJSglaiE8ISYlIhsoQg==?=
> =?ISO-2022-JP?B?GyRCJS8lOyU1JWohPCVLJWUhPCU5GyhC?=
> <jewelry@...>
>  
> Displayed as:
>
>   From: 楽天市場ジュエリー・アクセサリーニュース
> <jewelry@...>
>   ...snip...
>   Subject: ★ポイント10倍&送料無料★1,000円ジュエリー大集合!【楽天】
> (2008/03/19)
>  
> If anybody can explain me the problem I would be most gratefull :)
>
> Thanks, Dietrich
>
>
>
>


--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: Problems with mime encoding of Japanese Characters in Subject and'From:' etc. fields.

by Tomas Kuliavas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> Hi,
>
> I try to send messages written in Japanese (Kana/Kanji) with php.
>
> Everything works fine - only when the subject (or the name of the
> sender) becomes longer, there seems to be something wrong with the
> encoding: Neither my nor the mail reader of other Japanese friends is
> able to decode the mime string. At the place of the Japanese
> Characters, the mime string itself is displayed.
>
> As this doesn't happen for other Japanese emails with even long
> subjects, I suppose I did something wrong...
>
> When using the corresponding php mb_* functions to decode the string
> back, sometimes the correct original string and sometimes meaningless
> characters are shown.
>
> Here how I convert the subject (the name is converted using the same
> method and the sources are saved in UTF-8 using emacs):
>
>   $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
>   $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
>   ...snip...
>   mail($to, $subjectMIME, $bodyJIS, $headers);
>
> Here part of the message as it is displayed by my mail program:
>
>   From:
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B <d.bollmann@...>
>   ...snip...
>   Subject:
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?= =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?= (B
...
> If anybody can explain me the problem I would be most gratefull :)

You forgot to mention your PHP version, configure options related to
mbstring and php mbstring configuration.

Could you explain why Japanese are so obsessed with ISO-2022-JP? Why
can't you just send it in Base64 encoded UTF-8?

--
Tomas

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: Problems with mime encoding of Japanese Characters in Subject and'From:' etc. fields.

by Tomas Kuliavas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Hi,
>
> I try to send messages written in Japanese (Kana/Kanji) with php.
>
> Everything works fine - only when the subject (or the name of the
> sender) becomes longer, there seems to be something wrong with the
> encoding: Neither my nor the mail reader of other Japanese friends is
> able to decode the mime string. At the place of the Japanese
> Characters, the mime string itself is displayed.
>
> As this doesn't happen for other Japanese emails with even long
> subjects, I suppose I did something wrong...
>
> When using the corresponding php mb_* functions to decode the string
> back, sometimes the correct original string and sometimes meaningless
> characters are shown.
>
> Here how I convert the subject (the name is converted using the same
> method and the sources are saved in UTF-8 using emacs):
>
>   $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
>   $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
>   ...snip...
>   mail($to, $subjectMIME, $bodyJIS, $headers);
>
> Here part of the message as it is displayed by my mail program:
>
>   From:
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B <d.bollmann@...>
>   ...snip...
>   Subject:
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?= =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?= (B
>   ...snip...
...
> If anybody can explain me the problem I would be most gratefull :)

Bug in mb_encode_mimeheader. Function does not follow rfc2047, chapter
3, second paragraph. I suspect that function base64 encodes string first
and then splits it according to length argument or fails to add escapes
when texts in iso-2022 charsets are folded. It breaks iso-2022 escapes.

In http://bugs.php.net/bug.php?id=23192 moriyoshi@... wrote that
issues should be reported on php-i18n first. Header posted on that bug
report shows same issue with broken iso-2022 escapes, but Moriyoshi
wrote that it is encoded correctly.

Are things unchanged since 2003-04? Do I have to report bug here or on
bugs.php.net?

--
Tomas

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: Re: Problems with mime encoding of Japanese Characters in Subject and'From:' etc. fields.

by david.blomberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tomas Kuliavas wrote:

>> Hi,
>>
>> I try to send messages written in Japanese (Kana/Kanji) with php.
>>
>> Everything works fine - only when the subject (or the name of the
>> sender) becomes longer, there seems to be something wrong with the
>> encoding: Neither my nor the mail reader of other Japanese friends is
>> able to decode the mime string. At the place of the Japanese
>> Characters, the mime string itself is displayed.
>>
>> As this doesn't happen for other Japanese emails with even long
>> subjects, I suppose I did something wrong...
>>
>> When using the corresponding php mb_* functions to decode the string
>> back, sometimes the correct original string and sometimes meaningless
>> characters are shown.
>>
>> Here how I convert the subject (the name is converted using the same
>> method and the sources are saved in UTF-8 using emacs):
>>
>>   $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
>>   $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
>>   ...snip...
>>   mail($to, $subjectMIME, $bodyJIS, $headers);
>>
>> Here part of the message as it is displayed by my mail program:
>>
>>   From:
>> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B <d.bollmann@...>
>>   ...snip...
>>   Subject:
>> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?= =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?= (B
> ...
>> If anybody can explain me the problem I would be most gratefull :)
I have seen this problem in a few mail clients My solution in the past
has been to merge the 2 encoding strings into a single encoding string
to avoid the client getting messed when it sees the second
"=?ISO-2022-JP" in the Header line. (this is really a big problem for
Apple iMail-I have seen it regardless of the programming language used)
>
> You forgot to mention your PHP version, configure options related to
> mbstring and php mbstring configuration.
>
> Could you explain why Japanese are so obsessed with ISO-2022-JP? Why
> can't you just send it in Base64 encoded UTF-8?
>
Some brain dead ISPs/Mobile services here in Japan only support
ISO-2022-JP.

David Blomberg


--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: Re: Problems with mime encoding of Japanese Charactersin Subject and'From:' etc. fields.

by Tomas Kuliavas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>> Hi,
>>> I try to send messages written in Japanese (Kana/Kanji) with php.
>>>
>>> Everything works fine - only when the subject (or the name of the
>>> sender) becomes longer, there seems to be something wrong with the
>>> encoding: Neither my nor the mail reader of other Japanese friends is
>>> able to decode the mime string. At the place of the Japanese
>>> Characters, the mime string itself is displayed.
>>>
>>> As this doesn't happen for other Japanese emails with even long
>>> subjects, I suppose I did something wrong...
>>>
>>> When using the corresponding php mb_* functions to decode the string
>>> back, sometimes the correct original string and sometimes meaningless
>>> characters are shown.
>>>
>>> Here how I convert the subject (the name is converted using the same
>>> method and the sources are saved in UTF-8 using emacs):
>>>
>>>   $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
>>>   $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
>>>   ...snip...
>>>   mail($to, $subjectMIME, $bodyJIS, $headers);
>>>
>>> Here part of the message as it is displayed by my mail program:
>>>
>>>   From:
>>> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B
>>> <d.bollmann@...>
>>>   ...snip...
>>>   Subject:
>>> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?=
>>> =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=
>>> (B
>> ...
>>> If anybody can explain me the problem I would be most gratefull :)
> I have seen this problem in a few mail clients My solution in the past
> has been to merge the 2 encoding strings into a single encoding string
> to avoid the client getting messed when it sees the second
> "=?ISO-2022-JP" in the Header line. (this is really a big problem for
> Apple iMail-I have seen it regardless of the programming language used)

Again RFC2047.
---
An 'encoded-word' may not be more than 75 characters long, including
'charset', 'encoding', 'encoded-text', and delimiters.
---

>>
>> You forgot to mention your PHP version, configure options related to
>> mbstring and php mbstring configuration.
>>
>> Could you explain why Japanese are so obsessed with ISO-2022-JP? Why
>> can't you just send it in Base64 encoded UTF-8?
>>
> Some brain dead ISPs/Mobile services here in Japan only support
> ISO-2022-JP.

Do they need another four black ships in order to change things?

ISO-2022 texts can be encoded correctly, but it is harder to implement
than iso-8859 or utf-8/utf-16 mime encoding. I suggest sending text in
utf-8 and asking brain dead ISPs to fix their software. Even if it is
DoCoMo. If Dietrich uses script in some html form, he does not know if
text submitted in that form is Japanese.

Instead of
----
$subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
$subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
----
do
----
mb_internal_encoding('utf-8');
$subjectMIME = mb_encode_mimeheader($subject, "utf-8", "B");
----

--
Tomas

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


RE: Re: Problems with mime encoding of Japanese Charactersin Subject and'From:' etc. fields.

by Andi Gutmans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Unrelated - We are looking for people who will contribute unit tests to PHP 5.3 for ext/mbstring esp. input encoding coversion (Shift-JIS, etc..). Any volunteers please contact internals@

Andi

> -----Original Message-----
> From: Tomas Kuliavas [mailto:tokul@...]
> Sent: Tuesday, March 25, 2008 8:49 PM
> To: php-i18n@...
> Subject: Re: [PHP-I18N] Re: Problems with mime encoding of Japanese
> Charactersin Subject and'From:' etc. fields.
>
> >>> Hi,
> >>> I try to send messages written in Japanese (Kana/Kanji) with php.
> >>>
> >>> Everything works fine - only when the subject (or the name of the
> >>> sender) becomes longer, there seems to be something wrong with the
> >>> encoding: Neither my nor the mail reader of other Japanese friends
> is
> >>> able to decode the mime string. At the place of the Japanese
> >>> Characters, the mime string itself is displayed.
> >>>
> >>> As this doesn't happen for other Japanese emails with even long
> >>> subjects, I suppose I did something wrong...
> >>>
> >>> When using the corresponding php mb_* functions to decode the
> string
> >>> back, sometimes the correct original string and sometimes
> meaningless
> >>> characters are shown.
> >>>
> >>> Here how I convert the subject (the name is converted using the
> same
> >>> method and the sources are saved in UTF-8 using emacs):
> >>>
> >>>   $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP",
> "AUTO");
> >>>   $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP",
> "B");
> >>>   ...snip...
> >>>   mail($to, $subjectMIME, $bodyJIS, $headers);
> >>>
> >>> Here part of the message as it is displayed by my mail program:
> >>>
> >>>   From:
> >>> =?ISO-2022-
> JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-
> 2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B
> >>> <d.bollmann@...>
> >>>   ...snip...
> >>>   Subject:
> >>> =?ISO-2022-
> JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?=
> >>> =?ISO-2022-
> JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=
> >>> (B
> >> ...
> >>> If anybody can explain me the problem I would be most gratefull :)
> > I have seen this problem in a few mail clients My solution in the
> past
> > has been to merge the 2 encoding strings into a single encoding
> string
> > to avoid the client getting messed when it sees the second
> > "=?ISO-2022-JP" in the Header line. (this is really a big problem for
> > Apple iMail-I have seen it regardless of the programming language
> used)
>
> Again RFC2047.
> ---
> An 'encoded-word' may not be more than 75 characters long, including
> 'charset', 'encoding', 'encoded-text', and delimiters.
> ---
>
> >>
> >> You forgot to mention your PHP version, configure options related to
> >> mbstring and php mbstring configuration.
> >>
> >> Could you explain why Japanese are so obsessed with ISO-2022-JP? Why
> >> can't you just send it in Base64 encoded UTF-8?
> >>
> > Some brain dead ISPs/Mobile services here in Japan only support
> > ISO-2022-JP.
>
> Do they need another four black ships in order to change things?
>
> ISO-2022 texts can be encoded correctly, but it is harder to implement
> than iso-8859 or utf-8/utf-16 mime encoding. I suggest sending text in
> utf-8 and asking brain dead ISPs to fix their software. Even if it is
> DoCoMo. If Dietrich uses script in some html form, he does not know if
> text submitted in that form is Japanese.
>
> Instead of
> ----
> $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
> $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
> ----
> do
> ----
> mb_internal_encoding('utf-8');
> $subjectMIME = mb_encode_mimeheader($subject, "utf-8", "B");
> ----
>
> --
> Tomas
>
> --
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php

LightInTheBox - Buy quality products at wholesale price