BOM inside tokens

View: New views
10 Messages — Rating Filter:   Alert me  

BOM inside tokens

by Igor Bukanov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The currently proposed rule for byte-order-mark (BOM) characters in
ES4 sources is to replace them by whitespace outside of tokens. But
what is exactly the tokens in a case like -<bom>-?

AFAICS it would be treated as - - turning cases like:
  -<bom>-a;
into
  - -a;
versus
  --a;
that would be with current ES3 implementations.

Regards, Igor
_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss

Re: BOM inside tokens

by Ash Berlin-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 15 Jul 2008, at 18:22, Igor Bukanov wrote:

> The currently proposed rule for byte-order-mark (BOM) characters in
> ES4 sources is to replace them by whitespace outside of tokens. But
> what is exactly the tokens in a case like -<bom>-?
>
> AFAICS it would be treated as - - turning cases like:
>  -<bom>-a;
> into
>  - -a;
> versus
>  --a;
> that would be with current ES3 implementations.
>
> Regards, Igor
> _

Hmmm. according do UnicodeCheck app on my mac (and thus to one version  
or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK  
SPACE'

• NamesList:
                = BYTE ORDER MARK (BOM), ZWNBSP
                • may be used to detect byte order by contrast with the  
noncharacter code point FFFE
                • use as an indication of non-breaking is deprecated; see 2060  
instead
                → (zero width space - 200B)
                → (word joiner - 2060)
                → (<not a character> - FFFE)
• Designated in Unicode 1.1

I'd say that a BOM should be treated just like any ordinary whitespace  
char - namely that it should invalid in spaces, and beyond that why is  
any conversion needed, since its a valid unicode character...

-ash
_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss

Re: BOM inside tokens

by Ash Berlin-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 15 Jul 2008, at 18:39, Ash Berlin wrote:

>
> On 15 Jul 2008, at 18:22, Igor Bukanov wrote:
>
>> The currently proposed rule for byte-order-mark (BOM) characters in
>> ES4 sources is to replace them by whitespace outside of tokens. But
>> what is exactly the tokens in a case like -<bom>-?
>>
>> AFAICS it would be treated as - - turning cases like:
>> -<bom>-a;
>> into
>> - -a;
>> versus
>> --a;
>> that would be with current ES3 implementations.
>>
>> Regards, Igor
>> _
>
> Hmmm. according do UnicodeCheck app on my mac (and thus to one version
> or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK
> SPACE'
>
> • NamesList:
> = BYTE ORDER MARK (BOM), ZWNBSP
> • may be used to detect byte order by contrast with the
> noncharacter code point FFFE
> • use as an indication of non-breaking is deprecated; see 2060
> instead
> → (zero width space - 200B)
> → (word joiner - 2060)
> → (<not a character> - FFFE)
> • Designated in Unicode 1.1
>
> I'd say that a BOM should be treated just like any ordinary whitespace
> char - namely that it should invalid in spaces, and beyond that why is
> any conversion needed, since its a valid unicode character...
>

Invalid in *identifiers*


_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss

Re: BOM inside tokens

by Igor Bukanov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/7/15 Ash Berlin <ash_es4@...>:
>
> I'd say that a BOM should be treated just like any ordinary whitespace
> char - namely that it should invalid in spaces, and beyond that why is
> any conversion needed, since its a valid unicode character...

The problem comes from the current ES3 implementations that strip BOM
from the sources and web pages placing BOM in arbitrary places in JS
sources. So the question is should ES4 at least partially be
compatible with the current code?

igor
_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss

Re: BOM inside tokens

by Mark Miller-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 15, 2008 at 11:00 AM, Igor Bukanov <igor@...> wrote:
2008/7/15 Ash Berlin <ash_es4@...>:
>
> I'd say that a BOM should be treated just like any ordinary whitespace
> char - namely that it should invalid in spaces, and beyond that why is
> any conversion needed, since its a valid unicode character...

The problem comes from the current ES3 implementations that strip BOM
from the sources and web pages placing BOM in arbitrary places in JS
sources. So the question is should ES4 at least partially be
compatible with the current code?

As we've found with the ES3-specified stripping of Cf characters, the main effect of such transparent stripping of characters is to help attackers slip XSS attacks past defensive filters. ES3.1 agrees with ES4 that BOMs and Cfs should be treated as whitespace rather than stripped.
 
--
Text by me above is hereby placed in the public domain

   Cheers,
   --MarkM

_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss

Re: BOM inside tokens

by Igor Bukanov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/7/15 Mark Miller <erights@...>:
> As we've found with the ES3-specified stripping of Cf characters, the main
> effect of such transparent stripping of characters is to help attackers slip
> XSS attacks past defensive filters. ES3.1 agrees with ES4 that BOMs and Cfs
> should be treated as whitespace rather than stripped.

But this mean that it will silently change the semantic of
+<bom-or-cf>+ from ++ into + +. From the security point of view it
would be better to treat such cases as syntax errors. A possible rule
could be to allow BOM/Cf only in strings/regexp leterals or if such
character follow/precedes non-zero-width white space character.
_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss

Re: BOM inside tokens

by Mark S. Miller-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 15, 2008 at 11:27 AM, Igor Bukanov <igor@...> wrote:
2008/7/15 Mark Miller <erights@...>:
> As we've found with the ES3-specified stripping of Cf characters, the main
> effect of such transparent stripping of characters is to help attackers slip
> XSS attacks past defensive filters. ES3.1 agrees with ES4 that BOMs and Cfs
> should be treated as whitespace rather than stripped.

Speaking only for myself, yes, I'd be even happier with the syntax error. I have proposed such harsh treatment before but various objections were raised. In any case, again speaking only for myself, I'm happy with any solution that repairs the security holes created by stripping and avoids introducing new holes.

--
Cheers,
--MarkM
_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss

Re: BOM inside tokens

by Igor Bukanov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It seems the current IE7/IE8 behavior is to allow Cf only in srtring
and regexp literals and to allow BOM only in string/regexps or at the
beginning of the source, see
https://bugzilla.mozilla.org/show_bug.cgi?id=430740#c32 . This is very
reasonable.

Igor
_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss

Re: BOM inside tokens

by Waldemar Horwat :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Igor Bukanov wrote:
> It seems the current IE7/IE8 behavior is to allow Cf only in srtring
> and regexp literals and to allow BOM only in string/regexps or at the
> beginning of the source,

Precisely what does "in string and regexp literals" mean?  The exact interpretation of this phrase is the core source of the aforementioned security holes.

Folks have exploited putting special characters right after a backslash to break out of whitelisted literals and execute arbitrary code from JSON; a few months ago I demonstrated such an attack.  Regular expressions offer even more opportunities for this kind of mischief.

    Waldemar
_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss

Re: BOM inside tokens

by Brendan Eich-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Latest news in the bug:

https://bugzilla.mozilla.org/show_bug.cgi?id=430740#c42

Igor wrote:

"So MSIE simply treats BOM as a whitespace for the purpose of  
parsing. Which
suggests to do this in SM to fix the bug: treat BOM as one of Unicode
whitespace characters in the scanner avoiding any character skipping or
patching."

So no security issues with stripping. Another triumph of de-facto  
standard over de-jure.

Pratap got this into ES3.1 drafts already.

/be
_______________________________________________
Es4-discuss mailing list
Es4-discuss@...
https://mail.mozilla.org/listinfo/es4-discuss
LightInTheBox - Buy quality products at wholesale price