MathML entities don't degrade gracefully

View: New views
6 Messages — Rating Filter:   Alert me  

MathML entities don't degrade gracefully

by Henri Sivonen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


I think the inclusion of the MathML entities in HTML5 regardless of a  
MathML context violates the Degrade Gracefully design principle of the  
HTML WG. The entities don't add anything to the expressiveness of the  
language: anything that you can express with the entities you can also  
express with numeric character references or by using UTF-8 directly.  
However,  when an author uses entities that have not been  
traditionally supported by HTML, the rendering of the document in  
legacy user agents will be worse than in the situation where numeric  
character references or direct UTF-8 is used.

Could we get away with not supporting the MathML entity set in text/
html, considering that MathML subtrees are expected to be generated by  
converter software anyway?

As for application/xhtml+xml, the situation is even worse. DTDs don't  
work on the Web[1] and are mostly useless legacy. So far, HTML 5 has  
encouraged DTDlessness for XHTML5--and rightly so. Using the MathML  
entities in XML requires a doctype, because otherwise the document  
would be ill-formed. Browsers won't fetch a DTD based on the doctype,  
so we need to consider existing magic public IDs and potential future  
public IDs. Either way, the situation will be bad from the point of  
view of the Degrade Gracefully design principle:

When an old magic public ID is used, Firefox renders the right  
character, Safari shows an XML parse error and Opera renders a  
placeholder that looks like an entity reference.[2]

When a future public ID is used, Firefox shows an XML parse error,  
Safari shows an XML parse error and Opera renders a placeholder that  
looks like an entity reference.[3]

The result in Opera is bad in application/xhtml+xml although no worse  
than in text/html. In Safari, MathML entities in application/xhtml+xml  
are dramatically user experience-breaking in both public ID cases. In  
Firefox, using an old magic public ID would work, but trying to  
introduce *any* new public ID *ever* would lead to a dramatically bad  
experience in old versions.

Wouldn't it be better to just say "No" to the MathML entities on the  
Web and ask MathML generators to produce Unicode directly? (The few  
people who write MathML by hand are probably proficient enough to  
parse with DTD and re-serialize without DTD at their end before  
sending the re-serialized document over the public network.)

[1] http://hsivonen.iki.fi/no-dtd/
[2] http://hsivonen.iki.fi/test/moz/math-entity-known-dtd.xhtml
[3] http://hsivonen.iki.fi/test/moz/math-entity-unknown-dtd.xhtml
--
Henri Sivonen
hsivonen@...
http://hsivonen.iki.fi/




Re: MathML entities don't degrade gracefully

by David Carlisle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message




Henri,

> As for application/xhtml+xml, the situation is even worse.

The fact that using an entity that's not defined is a wellformedness
error that probably causes the entire document to be rejected is, hmm
problematic, and the main reason why we try to keep the set of mathml
entity names unchanged, even if we occasionally change the definitions
to take account of additions to Unicode. The XML spec does leave an
escape clause that if the document references a DTD and the application
does not fetch the dtd then the error need not be fatal (thus allowing
Opera's current behaviour). Although most XML parsers (and certainly
anything using xslt/xpath/xquery) have to reject the document as the
xpath data model doesn't support undefined entities.

Going forward it has often been suggested that a possible way to
alleviate this problem is just for everyone to use the same set of
entities always, and putting them all in html5 would be a move in that
direction, although behaviour on existing systems is as you describe.

So it's really a matter of future benefits against bad fallback
behaviour on existing systems.

As I said to Ian earlier I think the most important thing is that the
definitions agree where they use the same name (and I think html5 and
mathml3 drafts do now agree). Whether html5 should include all the names
is less clear. It has some advantages and I would not argue against it,
but it also has some disadvantages and I wouldn't argue too strongly for
them to be kept either.

The MathML3 draft has modified all the example fragments of mathml code
never to use the entity form and always to use numeric character
references (together with a comment with the unicode name) to try to
wean people off entities.


David
(Personal response)


Re: MathML entities don't degrade gracefully

by David Carlisle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


 
Henri,

> Using the MathML entities in XML requires a doctype, because otherwise
> the document   would be ill-formed.

Yes and no. The HTML5 spec could state that when processing
application/xhtml+xml documents that the application should
(effectively) use a catalog that supplies DTD entity definitions for
the HTML5 entities (it may make sense to do this regardless of whether
the "html5 entity set" ends up being the html4 names or html4+mathml
names).

<!DOCTYPE html>
<html>
<p>φ</p>
</html>

or even just

<html>
<p>φ</p>
</html>

is well formed (but not valid) if the parser is using a catalog that says
(for example) that any document with document element "html" should use
a dtd that (just) defines some set of html5 entities.

David


Re: MathML entities don't degrade gracefully

by Henri Sivonen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Apr 25, 2008, at 13:59 , David Carlisle wrote:

>> Using the MathML entities in XML requires a doctype, because  
>> otherwise
>> the document   would be ill-formed.
>
> Yes and no. The HTML5 spec could state that when processing
> application/xhtml+xml documents that the application should
> (effectively) use a catalog that supplies DTD entity definitions for
> the HTML5 entities (it may make sense to do this regardless of whether
> the "html5 entity set" ends up being the html4 names or html4+mathml
> names).

The HTML 5 spec could indeed specify precise what kind of entity  
resolver needs to be supplied to a vanilla XML 1.0 parser when parsing  
application/xhtml+xml without having to fork XML. If we do that, I  
suggest standardizing Gecko's catalog of two special DTDs and the  
particular public IDs that map to these.

> <!DOCTYPE html>
> <html>
> <p>φ</p>
> </html>
>
> or even just
>
> <html>
> <p>φ</p>
> </html>
>
> is well formed (but not valid) if the parser is using a catalog that  
> says
> (for example) that any document with document element "html" should  
> use
> a dtd that (just) defines some set of html5 entities.

This, on the other hand, would mean forking XML and creating something  
that's almost XML but not quite--thereby making it incompatible with  
deployed browsers and the existing XML toolchain. If we went that  
route, I think we should do it the right way the first time and have  
only one major discontinuity point. In that case, instead of fixing  
one XML design flaw at a time, we should go all the way to "XML5" on  
the first try specifying non-Draconian streamable error handling,  
adding MathML entities as built-in, removing *all* restrictions on  
what characters can appear in a Name and removing DTDs all in the same  
go.

--
Henri Sivonen
hsivonen@...
http://hsivonen.iki.fi/




Re: MathML entities don't degrade gracefully

by David Carlisle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message




> This, on the other hand, would mean forking XML and creating something  
> that's almost XML but not quite-

No, as I said I think, given that the mechanism by which an XML parser
finds (or does not find) an external DTD is more or less unspecified,
this does not require forking XML, certainly any XML parser with catalog
support already could be made to accept those two examples.

> thereby making it incompatible withdeployed browsers

Ye, as you showed at the start of the thread. If you pass an undefined
entity to (most) browsers in XML mode  you get a very agressive
rejection of the whole document. If you put keeping existing behaviour
as top priority then there is nothing you can do to change that, you are
saying you want to keep that error behaviour. If you do something
(anything) to make the entity have a default definition or in some other
way prevent the rejection of the entire document then it will be
incompatible with deployed browsers.

> and the existing XML toolchain.

I think this can work with existing XML toolchain. It stretches things a
bit and isn't without problems, but no solution here is without
problems, it's just a judgement call on which is the least horrible
solution.

> we should go all the way to "XML5" on the first try specifying
> non-Draconian streamable error handling, adding MathML entities as
> built-in, removing *all* restrictions on what characters can appear in
> a Name and removing DTDs all in the same go.

Yes the idea of building in all the entities has come up several times
in "XML 2" (aka XML 5) discussions on xml-dev and elsewhere.

David


Re: MathML entities don't degrade gracefully

by Ian Hickson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Thu, 24 Apr 2008, Henri Sivonen wrote:

>
> I think the inclusion of the MathML entities in HTML5 regardless of a
> MathML context violates the Degrade Gracefully design principle of the
> HTML WG. The entities don't add anything to the expressiveness of the
> language: anything that you can express with the entities you can also
> express with numeric character references or by using UTF-8 directly.
> However, when an author uses entities that have not been traditionally
> supported by HTML, the rendering of the document in legacy user agents
> will be worse than in the situation where numeric character references
> or direct UTF-8 is used.

It will be worse, but it won't be dramatically worse.

In the transition period, people can avoid using the new entities.
However, I don't see a good reason to prevent their use in the future. If
we ever want to use new entities, we have to add them.

We've added entities before (e.g. €) without major problems.


> As for application/xhtml+xml, the situation is even worse. DTDs don't
> work on the Web[1] and are mostly useless legacy. So far, HTML 5 has
> encouraged DTDlessness for XHTML5--and rightly so. Using the MathML
> entities in XML requires a doctype, because otherwise the document would
> be ill-formed. Browsers won't fetch a DTD based on the doctype, so we
> need to consider existing magic public IDs and potential future public
> IDs. Either way, the situation will be bad from the point of view of the
> Degrade Gracefully design principle [...]

I don't think it's a critical problem if the XML authoring experience is
worse than the text/html one. After all, it's already worse for many other
reasons. What's special about this one?

The entities in HTML5 don't apply to XHTML5. The spec says as much.

--
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

LightInTheBox - Buy quality products at wholesale price!