Backwards compatibility and DOCTYPE

View: New views
6 Messages — Rating Filter:   Alert me  

Backwards compatibility and DOCTYPE

by Bert Bos :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


The HTML5 WD states (section 1.1.1[1]) that the format is meant to be as
much backwards-compatible as possible. With a little change to section
8.1.1[2], HTML5 could, in fact, be fully backwards compatible.

The current version (4.01) of HTML requires[3] documents to start with
this DOCTYPE line:

     <!doctype html public "-//W3C//DTD HTML 4.01//EN"
     "http://www.w3.org/TR/html4/strict.dtd">

But that line is not allowed in the latest draft of version 5. Why not?

The corresponding DOCTYPE lines from earlier versions of HTML can also
be allowed. I think most documents that conformed to HTML when those
DOCTYPE lines were current are still valid in HTML5 (once HTML5 allows
those DOCTYPEs.)

[1] http://www.w3.org/TR/2008/WD-html5-20080122/#relationship
[2] http://www.w3.org/TR/2008/WD-html5-20080122/#the-doctype
[3] http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.2



Bert
--
   Bert Bos                                ( W 3 C ) http://www.w3.org/
   http://www.w3.org/people/bos                               W3C/ERCIM
   bert@...                             2004 Rt des Lucioles / BP 93
   +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France


Re: Backwards compatibility and DOCTYPE

by Frank Ellermann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Bert Bos wrote:
 
> The HTML5 WD states (section 1.1.1[1]) that the format is meant
> to be as much backwards-compatible as possible.

Among others <tt>, <s>, and <big> will be removed.  The statement
is marketing, e.g., for backwards-compatibility <s> is required.
Various attributes including align= and clear= are also required
for "Wilbur-browsers" and allowed in HTML 4, but not in HTML 5.

>      <!doctype html public "-//W3C//DTD HTML 4.01//EN"
>      "http://www.w3.org/TR/html4/strict.dtd">
 
> But that line is not allowed in the latest draft of version 5.
> Why not?

Because HTML 5 isn't HTML 4.  With strict you are anyway not more
compatible with legacy browsers.  What problem are you trying to
solve ?  HTML 4 strict already removed <s>, align=, clear=, etc.

Adding <tt>, <big>, etc. is no big deal if authors are determined
to create pages for "popular browsers" instead of "any browser".

HTML 5 apparently takes the approach to parse € up to Ÿ
(or € up to Ÿ) as Windows-1252 0x80 up to 0x9F.  HTML 5
parsers apparently do not insist on a <title>.  HTML 5 is a kind
of "GiGo", clearly defining what its garbage output will be for
almost any garbage input found in the wild.

8.1.1 claims that the "doctype" is useless, and parses anything as
"garbage input".  "HTML 5" is not HTML, SGML, XML, or XHTML, what
should it do with a real doctype, i.e. supposed to mean something,
above ignoring it as in 8.1.1 ?

 Frank



Re: Backwards compatibility and DOCTYPE

by Ian Hickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Wed, 5 Mar 2008, Bert Bos wrote:
>
> The HTML5 WD states (section 1.1.1[1]) that the format is meant to be as
> much backwards-compatible as possible. With a little change to section
> 8.1.1[2], HTML5 could, in fact, be fully backwards compatible.

What do you mean by backwards compatible in this context? HTML5 doesn't
claim that all legacy documents are conforming HTML5 documents (in fact no
legacy documents are conforming HTML5 documents); it only claims that
HTML5 user agents will process legacy documents in a manner compatible
with legacy user agents.


> The current version (4.01) of HTML requires[3] documents to start with
> this DOCTYPE line:
>
>     <!doctype html public "-//W3C//DTD HTML 4.01//EN"
>     "http://www.w3.org/TR/html4/strict.dtd">
>
> But that line is not allowed in the latest draft of version 5. Why not?

Because that line is HTML 4.01, not HTML5. If you want to write HTML 4.01,
the HTML5 spec is not relevant.

HTML5's UA requirements are compatible with that DOCTYPE, though, so a
user agent written to HTML5 will process that document in a manner
compatible with legacy user agents.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: Backwards compatibility and DOCTYPE

by Bert Bos :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Wednesday 05 March 2008 05:28, Ian Hickson wrote:

> On Wed, 5 Mar 2008, Bert Bos wrote:
> > The HTML5 WD states (section 1.1.1[1]) that the format is meant to
> > be as much backwards-compatible as possible. With a little change
> > to section 8.1.1[2], HTML5 could, in fact, be fully backwards
> > compatible.
>
> What do you mean by backwards compatible in this context? HTML5
> doesn't claim that all legacy documents are conforming HTML5
> documents (in fact no legacy documents are conforming HTML5
> documents);

Yes, and I wonder why. HTML5 can easily say that (most? all?) valid
HTML4 document are also valid HTML5 documents.

> it only claims that HTML5 user agents will process legacy
> documents in a manner compatible with legacy user agents.

That's not clear. The HTML5 draft says an incorrect DOCTYPE is an error
and while some UAs may silently ignore the error, others may (or even
must) report it.

>
> > The current version (4.01) of HTML requires[3] documents to start
> > with this DOCTYPE line:
> >
> >     <!doctype html public "-//W3C//DTD HTML 4.01//EN"
> >     "http://www.w3.org/TR/html4/strict.dtd">
> >
> > But that line is not allowed in the latest draft of version 5. Why
> > not?
>
> Because that line is HTML 4.01, not HTML5. If you want to write HTML
> 4.01, the HTML5 spec is not relevant.

Yes, it is. Once HTML5 is a REC, *it* defines HTML (see, e.g., sections
1.3 and 1.4.1) and HTML 4.01 is no longer relevant. It would be a pity
if old documents suddenly stopped being HTML, when they only differ in
a line that is "mostly useless" (as the draft says).

It's nice that HTML5 takes forward compatibility into account (by not
including a version number in document instances), but I don't see why
it has to break with the past. I know previous versions of HTML had the
same problem, but that is not a reason to repeat the mistake.

>
> HTML5's UA requirements are compatible with that DOCTYPE, though, so
> a user agent written to HTML5 will process that document in a manner
> compatible with legacy user agents.



Bert
--
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@...                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France


Re: Backwards compatibility and DOCTYPE

by Henri Sivonen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Disclaimer: Not a WG response.

On Mar 5, 2008, at 16:33, Bert Bos wrote:

> On Wednesday 05 March 2008 05:28, Ian Hickson wrote:
>> On Wed, 5 Mar 2008, Bert Bos wrote:
>>> The HTML5 WD states (section 1.1.1[1]) that the format is meant to
>>> be as much backwards-compatible as possible. With a little change
>>> to section 8.1.1[2], HTML5 could, in fact, be fully backwards
>>> compatible.
>>
>> What do you mean by backwards compatible in this context? HTML5
>> doesn't claim that all legacy documents are conforming HTML5
>> documents (in fact no legacy documents are conforming HTML5
>> documents);
>
> Yes, and I wonder why. HTML5 can easily say that (most? all?) valid
> HTML4 document are also valid HTML5 documents.
[...]
> Yes, it is. Once HTML5 is a REC, *it* defines HTML (see, e.g.,  
> sections
> 1.3 and 1.4.1) and HTML 4.01 is no longer relevant. It would be a pity
> if old documents suddenly stopped being HTML, when they only differ in
> a line that is "mostly useless" (as the draft says).

First, I agree that there's no *technical* reason why HTML 4.01 could  
not be rescinded and *some* HTML 4.01-looking doctypes be made  
conforming as HTML5.

> It's nice that HTML5 takes forward compatibility into account (by not
> including a version number in document instances), but I don't see why
> it has to break with the past. I know previous versions of HTML had  
> the
> same problem, but that is not a reason to repeat the mistake.

I've argued previously that in order for the current HTML5 to HTML6  
forward-compatibility story to be believable, the same story should  
apply to the HTML4 to HTML5 transition:
http://lists.whatwg.org/pipermail/help-whatwg.org/2008-February/000126.html

Here are some complicating points in no particular order:

  * Making doctypes that say "HTML 4.01" conforming as HTML5 will  
confuse some people who have not yet internalized the notion that Web  
formats shouldn't have versions with which the language grows and  
shrinks but only levels that grow the language monotonically.

  * We shouldn't make quirks mode doctypes conforming, which means  
that some doctypes that were conforming under HTML 4.01 have to be non-
conforming.

  * We probably shouldn't make the almost standard mode doctypes  
conforming, either, as long as the distinction between almost  
standards and standards modes is maintained in Gecko/Opera/WebKit.  
However, the almost standards mode is the common case among pragmatic  
standards-aware developers today. Therefore, it seems to me that in  
order to reap a notable practical benefit from making some legacy  
doctypes conforming, the CSS WG would need to fix the CSS defaults to  
match the almost standards mode so that the need for distinguishing  
the almost standards mode and the standards mode would go away.
http://lists.w3.org/Archives/Public/www-style/2008Jan/0598.html

  * Valid HTML 4.01 formally includes various weird SGML stuff  
(minimizations in particular) that have never been supported by  
mainstream browsers. We shouldn't make that stuff conforming. Hence,  
we can't categorically make valid HTML 4.01 documents valid HTML5  
documents.

  * HTML 4.01 has some above-SGML-layer bits that are so bad and  
rarely used that we should take the opportunity to ban them: The way  
isindex is implemented in browsers is fundamentally incompatible with  
the generally understood relationship of markup and the DOM. basefont  
is obsolete in practice. This is another reason why we shouldn't  
categorically make valid HTML 4.01 documents valid HTML5 documents.

  * On a number of other points mainly pertaining to presentational or  
redundant attributes (in particular border=0, target=_blank,  
language=JavaScript and presentational attributes on tables that were  
allowed in 4.01 Strict) I think we indeed should make more HTML 4.01  
usage patterns conforming even if the patterns are no-ops or would  
have more elegant CSS alternatives.
http://lists.w3.org/Archives/Public/public-html/2008Jan/0305.html
http://hsivonen.iki.fi/test/moz/happycog-portfolio-results.txt

--
Henri Sivonen
hsivonen@...
http://hsivonen.iki.fi/




Re: Backwards compatibility and DOCTYPE

by Ian Hickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 5 Mar 2008, Bert Bos wrote:

> On Wednesday 05 March 2008 05:28, Ian Hickson wrote:
> > On Wed, 5 Mar 2008, Bert Bos wrote:
> > > The HTML5 WD states (section 1.1.1[1]) that the format is meant to
> > > be as much backwards-compatible as possible. With a little change
> > > to section 8.1.1[2], HTML5 could, in fact, be fully backwards
> > > compatible.
> >
> > What do you mean by backwards compatible in this context? HTML5
> > doesn't claim that all legacy documents are conforming HTML5 documents
> > (in fact no legacy documents are conforming HTML5 documents);
>
> Yes, and I wonder why. HTML5 can easily say that (most? all?) valid
> HTML4 document are also valid HTML5 documents.
The main reason is that we didn't want to confuse the issue of versioning
by claiming that documents labeled "HTML4" were in fact HTML5.

Going forward we've resolved this -- since HTML5 has no version number in
the source, HTML5 and HTML6 can't be distinguished, and HTML6 can safely
take over HTML5's documents and wipe HTML5 off the face of the earth. :-)


> > it only claims that HTML5 user agents will process legacy documents in
> > a manner compatible with legacy user agents.
>
> That's not clear. The HTML5 draft says an incorrect DOCTYPE is an error
> and while some UAs may silently ignore the error, others may (or even
> must) report it.

Yeah, a better way of phrasing it is that the HTML5 draft says how HTML5
user agents can process legacy documens in a manner compatible with legacy
user agents. (Error recovery is not required in all cases, e.g. when
parsing, because some use cases -- such as preprocessors intended for
developers who have control over the markup -- don't need to handle broken
code, since the developer is right there to fix it.)


> > > The current version (4.01) of HTML requires[3] documents to start
> > > with this DOCTYPE line:
> > >
> > >     <!doctype html public "-//W3C//DTD HTML 4.01//EN"
> > >     "http://www.w3.org/TR/html4/strict.dtd">
> > >
> > > But that line is not allowed in the latest draft of version 5. Why
> > > not?
> >
> > Because that line is HTML 4.01, not HTML5. If you want to write HTML
> > 4.01, the HTML5 spec is not relevant.
>
> Yes, it is. Once HTML5 is a REC, *it* defines HTML (see, e.g., sections
> 1.3 and 1.4.1) and HTML 4.01 is no longer relevant. It would be a pity
> if old documents suddenly stopped being HTML, when they only differ in a
> line that is "mostly useless" (as the draft says).
They don't stop being HTML, they just aren't conforming HTML5 docs. They
are still conforming HTML4 docs.


> It's nice that HTML5 takes forward compatibility into account (by not
> including a version number in document instances), but I don't see why
> it has to break with the past. I know previous versions of HTML had the
> same problem, but that is not a reason to repeat the mistake.

I think that HTML5 claiming that documents that say "HTML4" are actually
HTML5 is something that would be much harder to sell.

In practice, it doesn't really matter much.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
LightInTheBox - Buy quality products at wholesale price!