|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
BOM inside tokensThe currently proposed rule for byte-order-mark (BOM) characters in
ES4 sources is to replace them by whitespace outside of tokens. But what is exactly the tokens in a case like -<bom>-? AFAICS it would be treated as - - turning cases like: -<bom>-a; into - -a; versus --a; that would be with current ES3 implementations. Regards, Igor _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
|
|
Re: BOM inside tokensOn 15 Jul 2008, at 18:22, Igor Bukanov wrote: > The currently proposed rule for byte-order-mark (BOM) characters in > ES4 sources is to replace them by whitespace outside of tokens. But > what is exactly the tokens in a case like -<bom>-? > > AFAICS it would be treated as - - turning cases like: > -<bom>-a; > into > - -a; > versus > --a; > that would be with current ES3 implementations. > > Regards, Igor > _ Hmmm. according do UnicodeCheck app on my mac (and thus to one version or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK SPACE' • NamesList: = BYTE ORDER MARK (BOM), ZWNBSP • may be used to detect byte order by contrast with the noncharacter code point FFFE • use as an indication of non-breaking is deprecated; see 2060 instead → (zero width space - 200B) → (word joiner - 2060) → (<not a character> - FFFE) • Designated in Unicode 1.1 I'd say that a BOM should be treated just like any ordinary whitespace char - namely that it should invalid in spaces, and beyond that why is any conversion needed, since its a valid unicode character... -ash _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
|
|
Re: BOM inside tokensOn 15 Jul 2008, at 18:39, Ash Berlin wrote: > > On 15 Jul 2008, at 18:22, Igor Bukanov wrote: > >> The currently proposed rule for byte-order-mark (BOM) characters in >> ES4 sources is to replace them by whitespace outside of tokens. But >> what is exactly the tokens in a case like -<bom>-? >> >> AFAICS it would be treated as - - turning cases like: >> -<bom>-a; >> into >> - -a; >> versus >> --a; >> that would be with current ES3 implementations. >> >> Regards, Igor >> _ > > Hmmm. according do UnicodeCheck app on my mac (and thus to one version > or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK > SPACE' > > • NamesList: > = BYTE ORDER MARK (BOM), ZWNBSP > • may be used to detect byte order by contrast with the > noncharacter code point FFFE > • use as an indication of non-breaking is deprecated; see 2060 > instead > → (zero width space - 200B) > → (word joiner - 2060) > → (<not a character> - FFFE) > • Designated in Unicode 1.1 > > I'd say that a BOM should be treated just like any ordinary whitespace > char - namely that it should invalid in spaces, and beyond that why is > any conversion needed, since its a valid unicode character... > Invalid in *identifiers* _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
|
|
Re: BOM inside tokens2008/7/15 Ash Berlin <ash_es4@...>:
> > I'd say that a BOM should be treated just like any ordinary whitespace > char - namely that it should invalid in spaces, and beyond that why is > any conversion needed, since its a valid unicode character... The problem comes from the current ES3 implementations that strip BOM from the sources and web pages placing BOM in arbitrary places in JS sources. So the question is should ES4 at least partially be compatible with the current code? igor _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
|
|
Re: BOM inside tokensOn Tue, Jul 15, 2008 at 11:00 AM, Igor Bukanov <igor@...> wrote:
2008/7/15 Ash Berlin <ash_es4@...>: As we've found with the ES3-specified stripping of Cf characters, the main effect of such transparent stripping of characters is to help attackers slip XSS attacks past defensive filters. ES3.1 agrees with ES4 that BOMs and Cfs should be treated as whitespace rather than stripped. Text by me above is hereby placed in the public domain Cheers, --MarkM _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
|
|
Re: BOM inside tokens2008/7/15 Mark Miller <erights@...>:
> As we've found with the ES3-specified stripping of Cf characters, the main > effect of such transparent stripping of characters is to help attackers slip > XSS attacks past defensive filters. ES3.1 agrees with ES4 that BOMs and Cfs > should be treated as whitespace rather than stripped. But this mean that it will silently change the semantic of +<bom-or-cf>+ from ++ into + +. From the security point of view it would be better to treat such cases as syntax errors. A possible rule could be to allow BOM/Cf only in strings/regexp leterals or if such character follow/precedes non-zero-width white space character. _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
|
|
Re: BOM inside tokensOn Tue, Jul 15, 2008 at 11:27 AM, Igor Bukanov <igor@...> wrote:
2008/7/15 Mark Miller <erights@...>: Speaking only for myself, yes, I'd be even happier with the syntax error. I have proposed such harsh treatment before but various objections were raised. In any case, again speaking only for myself, I'm happy with any solution that repairs the security holes created by stripping and avoids introducing new holes. -- Cheers, --MarkM _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
|
|
Re: BOM inside tokensIt seems the current IE7/IE8 behavior is to allow Cf only in srtring
and regexp literals and to allow BOM only in string/regexps or at the beginning of the source, see https://bugzilla.mozilla.org/show_bug.cgi?id=430740#c32 . This is very reasonable. Igor _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
|
|
Re: BOM inside tokensIgor Bukanov wrote:
> It seems the current IE7/IE8 behavior is to allow Cf only in srtring > and regexp literals and to allow BOM only in string/regexps or at the > beginning of the source, Precisely what does "in string and regexp literals" mean? The exact interpretation of this phrase is the core source of the aforementioned security holes. Folks have exploited putting special characters right after a backslash to break out of whitelisted literals and execute arbitrary code from JSON; a few months ago I demonstrated such an attack. Regular expressions offer even more opportunities for this kind of mischief. Waldemar _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
|
|
Re: BOM inside tokensLatest news in the bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=430740#c42 Igor wrote: "So MSIE simply treats BOM as a whitespace for the purpose of parsing. Which suggests to do this in SM to fix the bug: treat BOM as one of Unicode whitespace characters in the scanner avoiding any character skipping or patching." So no security issues with stripping. Another triumph of de-facto standard over de-jure. Pratap got this into ES3.1 drafts already. /be _______________________________________________ Es4-discuss mailing list Es4-discuss@... https://mail.mozilla.org/listinfo/es4-discuss |
| Free Forum Powered by Nabble | Forum Help |