|
View:
New views
18 Messages
—
Rating Filter:
Alert me
|
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugURL: <http://savannah.nongnu.org/bugs/?20252> Summary: [gnu.org #336933] RFC2047 header encoding bug Project: MHonArc Submitted by: jaginsberg Submitted on: Monday 06/25/2007 at 15:13 Category: Character Sets Severity: 3 - Normal Item Group: Incorrect Behavior Status: None Privacy: Public Assigned to: None Open/Closed: Open Discussion Lock: Any Operating System: All Perl Version: 5.8.3-19.5 Component Version: 0.7.3 Fixed Release: _______________________________________________________ Details: From a user report on lists.gnu.org: """ To provide a bit more info, I looked at the headers in my mailbox, and matched them with HTMLised messages from lists.gnu.org. It seems that only some forms of encoding are affected. I don't know which encoding is the first example using, but it displays fine in my mail client (mutt): This (http://lists.gnu.org/archive/html/grub-devel/2007-06/msg00004.html): From: =?UTF-8?B?VmVzYSBKw6TDpHNrZWzDpGluZW4=?= <chaac@...> displays as: From: Vesa JÃÃskelÃinen this (http://lists.gnu.org/archive/html/grub-devel/2007-05/msg00155.html): From: =?ISO-8859-1?Q?Vesa_J=E4=E4skel=E4inen?= <chaac@...> displays as: From: Vesa Jääskeläinen """ Thanks! -jag _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #1, bug #20252 (project mhonarc): Not a mhonarc bug. Almost certainly, mhonarc is converting the name to UTF-8, but Apache is sending the web page out with an ISO-8859-1 header. Here's an example of mhonarc doing just fine with a message from the same person. http://www.mail-archive.com/grub-devel@.../msg00411.htm Solution (going forward) is to have mhonarc produce UTF-8 for everything, and for the webserver to label it as such. _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #2, bug #20252 (project mhonarc): And here's the exact message. Note the combination of Chinese, English, and umlauts; unicode is the only answer. http://www.mail-archive.com/grub-devel@.../msg02923.html _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #3, bug #20252 (project mhonarc): I understand what you're trying to say, but I'm not sure you're correct. First, Apache is returning the pages UTF-8 encoded: HEAD /archive/html/grub-devel/2007-06/msg00004.html HTTP/1.0 Host: lists.gnu.org HTTP/1.1 200 OK Date: Tue, 26 Jun 2007 14:49:29 GMT Server: Apache/2.0.51 (Fedora) Last-Modified: Fri, 01 Jun 2007 21:19:30 GMT ETag: "e3105a-11a8-c5549c80" Accept-Ranges: bytes Content-Length: 4520 Connection: close Content-Type: text/html; charset=UTF-8 Second, the encodings presented as entities between the two pages are different. In the first URL, msg00004.html, the special characters are written as à whereas in the second URL, they are written as ä. The correct character has a unicode codepage of 0xe4, an iso-8859-1 encoding of 0xe4, and a utf-8 encoding of 0xc3a4. Given that, what I'm imagining has happened is that in the first case, the UTF-8 characters are assumed to be iso-8859-1, an 8 bit character encoding, and are written as the first byte of the UTF-8 encoding; however in the second case, I'm supposing that it is properly transcoding from utf-8 to latin1. But I'm not very fluent in the internals of MHonArc. Thoughts? -jag _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #4, bug #20252 (project mhonarc): Hi, I'm the person who reported this problem to gnu.org sysadmins. > Given that, what I'm imagining has happened is that in the first > case, the UTF-8 characters are assumed to be iso-8859-1, an 8 bit > character encoding, and are written as the first byte of the > UTF-8 encoding; however in the second case, I'm supposing that it > is properly transcoding from utf-8 to latin1. Sounds plausible. I suspect what makes it behave differently is that in the first case, subject is base64-encoded (in the second it isn't). I'm afraid I can't really help. I have zero knowledge about MHonArc internals (and am not very fluent in perl either). Sorry. _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #5, bug #20252 (project mhonarc): Erm, I meant to say From of course ;-) _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
Re: [bug #20252] [gnu.org #336933] RFC2047 header encoding bug[ -savannah because I am lazy ]
Ok, well we do have proof that mhonarc is capable of doing the right thing on the exact same message. I use the TEXTENCODE resource to send everything to UTF-8, which is probably the recommended mhonarc way of doing things these days anyway. http://www.mhonarc.org/MHonArc/doc/resources/textencode.html http://www.mhonarc.org/MHonArc/doc/rcfileexs/utf-8-encode.mrc.html So while one could dive in deep and try to figure out what is going on, another choice is just try TEXTENCODE and see if it all magically works. And if so, tell the bug tracking system. If that doesn't do the trick, I don't know what to say other than "works for me". --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #6, bug #20252 (project mhonarc): For me to do an accurate analysis, I first need access to the original raw mail message. Then, it will help to know what resource settings are being used for the archive in question since some resources affect how mhonarc process character sets. A quick dirty test is to run mhonarc (with default settings) on just the message in question to see what happens. If the HTML created looks proper, then the problem is due to some resource setting. An example may be if resource settings assume a single charset for all messages. If the HTML looks bad, then one possibility is how the email message is encoded. I.e. If the message is not conforming to email standards, things may not turn out right. Since character set processing may leverage different Perl modules depending on what the given perl installation provides, it is possible some module may be introducing errors. Of course, there may be a bug in MHonArc, but I cannot tell without testing. Since Jeff states that the message can be rendered properly, we at least know something does work properly :) Note, even though Jeff states that using TEXTENCODE to UTF-8 everything can be done, and is generally a good idea, it is dependent on the search engine that is being used for the archives. The gnu.org archives use Namazu, so UTF-8 encoding is not an option in this case. _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #7, bug #20252 (project mhonarc): I'm attaching a raw copy of the message that generated the bad html. I don't have a mhonarc install to test it. Is it possible to install and process a single message right-away without setting up MTA integration, etc? (file #13182) _______________________________________________________ Additional Item Attachment: File name: bad_message Size:3 KB _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
Re: [bug #20252] [gnu.org #336933] RFC2047 header encoding bug> I don't have a mhonarc install to test it. Is it possible to install and
> process a single message right-away without setting up MTA integration, etc? Yes. As a side note #1 I have the names of 564 gnu.org and nongnu.org mailing lists that have been hand checked and determined to be completely overrun by spam. Is there anyone at the FSF I should give these to? Side note #2 is mail-archive.com has kept secondary archives for FSF lists with permission for some time now. If it was helpful, we'd be happy to add FSF branding to those archives and swap primary/secondary roles with the FSF maintained archives. Cheers, Jeff --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #8, bug #20252 (project mhonarc): Just download mhonarc from www.mhonarc.org, install, and run. Mhonarc is independent of any MTA. To convert a single message, you can do: mhonarc -single message.822 > message.html I ran the above on the sample attached message, and the output looks correct. The name in question got translated to: Vesa Jääskeläinen I'd need to see the resource settings used by the gnu.org archives to see what may be wrong in the configuration. My guess is they could be "flattening out" character handling for lists, maybe for performance and/or security reasons. _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #9, bug #20252 (project mhonarc): Tried that, and got the same result (with 0xe4). However, this is latin1, and can't possibly work. As Jeff said: "Note the combination of Chinese, English, and umlauts; unicode is the only answer." The HTML generated at lists.gnu.org seems to be utf-8 but truncated at one byte. Is it possible that mhonarc handled this conversion, or do we have another conversion tool in place here? _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugUpdate of bug #20252 (project mhonarc): Status: None => Works For Me _______________________________________________________ Follow-up Comment #10: > The HTML generated at lists.gnu.org seems to be utf-8 but > truncated at one byte. Is it possible that mhonarc handled > conversion, or do we have another conversion tool in place > here? Only the gnu.org folks can answer that. At this time, I cannot confirm that this problem is due to mhonarc. You may want submit a bug to the gnu folks directly on this. I cannot provide much more help w/o knowing what resource configuration they are using and if there is custom processing that may introduce problems. _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #11, bug #20252 (project mhonarc): Isn't it the same as bug #11187? Sounds similar at least. (That one was fixed in mhonarc 2.6.11 while gnu.org uses 2.6.10.) _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #12, bug #20252 (project mhonarc): We are still running MHonarc 2.6.10 on the GNU lists server. I've just tested the conversion of the message with ./mhonarc -single input > output.html which generates the faulty encoding with 2.6.10, but does the right thing with 2.6.16. Our lists server is due for a major software upgrade sometime this fall at which point we will regenerate the html archives, which will resolve this problem. Thanks, Ward. _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #13, bug #20252 (project mhonarc): Cool, nice to see this will be fixed. Best regards _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugFollow-up Comment #14, bug #20252 (project mhonarc): My guess the following bug (fixed in v2.6.11) is the source of the problem: https://savannah.nongnu.org/bugs/?11187 Since it appears the problem is fixed in a later version of mhonarc, I'm going to close this item. If the problem persists after GNU's software update, we can reopen the issue. _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
|
|
[bug #20252] [gnu.org #336933] RFC2047 header encoding bugUpdate of bug #20252 (project mhonarc): Open/Closed: Open => Closed Fixed Release: => N/A _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?20252> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ --------------------------------------------------------------------- To sign-off this list, send email to majordomo@... with the message text UNSUBSCRIBE MHONARC-DEV |
| Free Forum Powered by Nabble | Forum Help |