|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 | Next > |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsNew submission from Brett Cannon <brett@...>: The following leads to a SyntaxError in 3.0: compile(b'# coding: latin-1\nu = "\xC7"\n', '<dummy>', 'exec') That is not the case in Python 2.6. ---------- messages: 71251 nosy: brett.cannon severity: normal status: open title: compile() cannot decode Latin-1 source encodings _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsBrett Cannon <brett@...> added the comment: Looks like Parser/tokenizer.c:check_coding_spec() considered Latin-1 a raw encoding just like UTF-8. Patch is in the works. _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsBrett Cannon <brett@...> added the comment: Here is a potential fix. It broke test_imp because it assumed that Latin-1 source files would be encoded at Latin-1 instead of UTF-8 when returned by imp.new_module(). Doesn't seem like a critical change as the file is still properly decoded. ---------- keywords: +patch Added file: http://bugs.python.org/file11130/fix_latin.diff _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsBrett Cannon <brett@...> added the comment: Attached is a test for test_pep3120 (since that is what most likely introduced the breakage). It's a separate patch since the source file is marked as binary and thus can't be diffed by ``svn diff``. ---------- components: +Interpreter Core priority: -> critical versions: +Python 3.0 Added file: http://bugs.python.org/file11131/pep3120_test.diff _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsChanges by Brett Cannon <brett@...>: ---------- type: -> behavior _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsBrett Cannon <brett@...> added the comment: Can someone double-check this patch for me? I don't have much experience with the parser so I want to make sure I am not doing anything wrong. _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsBrett Cannon <brett@...> added the comment: There is a potential dependency on issue3594 as it would change how imp.find_module() acts and thus make test_imp no longer fail in the way it has. ---------- dependencies: +PyTokenizer_FindEncoding() never succeeds _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsBenjamin Peterson <musiccomposition@...> added the comment: That line dates back to the PEP 263 implementation. Martin? ---------- nosy: +benjamin.peterson, loewis _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsChanges by Brett Cannon <brett@...>: ---------- priority: critical -> release blocker _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsChanges by Brett Cannon <brett@...>: ---------- keywords: +needs review _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsMartin v. Löwis <martin@...> added the comment: Since this is marked "release blocker", I'll provide a shallow comment: I don't think it should be a release blocker. It's a bug in the compile function, and there are various work-arounds (such as saving the bytes to a temporary file and executing that one, or decoding the byte string to a Unicode string, and then compiling the Unicode string). It is sufficient to fix it in 3.0.1. I don't think the patch is right: as the test had to be changed, it means that somewhere, the detection of the encoding declaration now fails. This is clearly a new bug, but I don't have the time to analyse the cause further. In principle, there is nothing wrong with the tokenizer treating latin-1 as "raw" - that only means we don't go through a codec. _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsBrett Cannon <brett@...> added the comment: Actually, the tests don't have to change; if issue 3594 gets applied then that change cascades into this issue and negates the need to change the tests themselves. As for treating Latin-1 as a raw encoding, how can that be theoretically okay if the parser assumes UTF-8 and Latin-1 is not a superset of Latin-1? _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsMartin v. Löwis <martin@...> added the comment: > As for treating Latin-1 as a raw encoding, how can that be theoretically > okay if the parser assumes UTF-8 and Latin-1 is not a superset of Latin-1? The parser doesn't assume UTF-8, but "ascii+", i.e. it passes all non-ASCII bytes on to the AST, which then needs to deal with them; it then could (but apparently doesn't) take into account whether the internal representation was UTF-8 or Latin-1: see ast.c:decode_unicode for some remains of that. The other case (besides string literals) where bytes > 127 matter is tokenizer.c:verify_identifier; this indeed assumes UTF-8 only (but could be easily extended to support Latin-1 as well). The third case where non-ASCII bytes are allowed is comments; there they are entirely ignored (i.e. it is not even verified that the comment is well-formed UTF-8). Removal of the special case should simplify the code; I would agree that any speedup gained by not going through a codec is irrelevant. I'm still puzzled why test_imp if the special case is removed. _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsBrett Cannon <brett@...> added the comment: The test_imp stuff has to do with PyTokenizer_FindEncoding(). imp.find_module() only opens the file, passes the file descriptor to PyTokenizer_FindEncoding() and then returns a file object with the found encoding. Problem is that (as issue 3594 points out), PyTokenizer_FindEncoding() always fails. That means it assumes only the raw encodings are okay. With Latin-1 being one of them, it returns the file opened as Latin-1 as is correct. Removing that case here means PyTokenizer_FindEncoding() fails, and thus assumes only UTF-8 as a legitimate encoding and opens the files with the UTF-8 encoding. It took a while to find these two bugs obviously. =) _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsBrett Cannon <brett@...> added the comment: I have attached a new version of the patch with the changes to test_imp removed as issue 3594 fixed the need for the change. I have also directly uploaded test_pep3120.py since it is flagged as binary and thus cannot be diffed by svn. Added file: http://bugs.python.org/file11398/fix_latin.diff _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsChanges by Brett Cannon <brett@...>: Removed file: http://bugs.python.org/file11130/fix_latin.diff _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsChanges by Brett Cannon <brett@...>: Added file: http://bugs.python.org/file11399/test_pep3120.py _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsChanges by Brett Cannon <brett@...>: Removed file: http://bugs.python.org/file11131/pep3120_test.diff _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsChanges by Barry A. Warsaw <barry@...>: ---------- priority: release blocker -> deferred blocker _______________________________________ Python tracker <report@...> <http://bugs.python.org/issue3574> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com |
|
|
[issue3574] compile() cannot decode Latin-1 source encodingsChanges by Barry A. Warsaw <barry@...>: ---------- priority: deferred blocker -> release blocker _______________________________________ Py |