|
View:
New views
19 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: Notes on validome test suite / validators comparisonValidome-Staff wrote: > Validome advices the user to use our XML-Validator, as a HTML-Validator > is not the appropriated tool to check XML...;-) When I try to validate <http://idn.icann.org/IDNwiki> at your site I get the same incorrect "valid" result as with the W3C validator. For the W3C validator I know that it can't (yet) check URI syntax, but it's disappointing that your validator also fails. Is than an issue in the "XHTML 1.0 transitional" schema or in your code ? Frank |
|
|
|
|
|
Re: Notes on validome test suite / validators comparisonValidome-Staff wrote: > http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#anyURI > "such rules and restrictions are not part of type validity > and are not checked by ·minimally conforming· processors. > Thus in practice the above definition imposes only very > modest obligations on ·minimally conforming· processors. " The 2nd edition 2004 still has the same text talking about RFC 2396 as amended by 2732 instead of RFC 3986 (STD 66) - okay, just checked it, STD 66 was published in January 2005. > As you know, there is no so simple as you claim to provide > a reliable URI check. The regexp in STD 66 is a one-liner, and determining the set of visible ASCII characters allowed in an URI is "possible". (Actually it's trivial, but it took me almost year to figure it out with the help Roy and others on the W3C URI list. ;-) > as we know - there is no validaor at the moment, which > handles it much better. Indeed, I just asked WDG and schneegans.de what they think, they also said "valid". Frank |
|
|
|
|
|
Critical bug 4916 (was: Notes on validome test suite / validators comparison)Alex wrote: >> The regexp in STD 66 is a one-liner, and determining the set >> of visible ASCII characters allowed in an URI is "possible". > My post was not about the ASCII character issue only...There > are some "URI" problems more, a schema validator doesn't catch > at the moment. Well, it's about time to fix this. After the installation of a "popular browser" on a "popular OS" virtually all applications allowing to click on URIs could indirectly start malware. It's hard to decide whose fault that is, but saying that it's only the fault of the user is no option. All, please "vote" for bug 4916 and support its reclassification as "critical" with "priority 1" for an immediate fix. We all had almost three years to think about RFC 3986 and 3987. It's a good thing that the IDN test finally forces some action. > A "RFC Conformity Checker" for URIs is much more than this single > ASCII issue. The generic RFC 3986 syntax is no rocket science, just ignore all idiosyncrasies of legacy definitions as in RFC 2368, admittedly mailto: is a hard case. The syntax in the expired mailto-bis draft is better. For a validator you're not forced to guess what invalid syntax is supposed to mean, simply flag it as invalid and be done with it. > NONtrivial...;-) Maybe we can agree on an "interesting clerical task". The xmpp folks (i.e. Peter) had to fix their syntax for 3986-compatibility, they (i.e. he) managed. Frank |
|
|
|
|
|
Re: Notes on validome test suite / validators comparisonHi Alex, Thanks a lot for going through the list, and giving more references. This is very useful. On Oct 20, 2007, at 00:05 , Validome-Staff wrote: > Here Validome advices the user to use our XML-Validator, as a HTML- > Validator is not the appropriated tool to check XML...;-) Understood, but as I wrote, I think it's not very good usability to call this a fatal error, when you could transparently redirect to your XML checker. > Here we corrected our claims, sorry for not keeping the comparison > up to date. Appreciated. > > * http://www.validome.org/out/ena4011 > > HTML 4.01 document with no system Id. > > Validome sends a warning... Not necessary per the spec. > > W3C Markup validator passes validation. > > Why is W3C validator marked as faulty here? References please? > > http://www.w3.org/TR/1999/REC-html401-19991224/struct/ > global.html#h-7.2 > Other way: Where is specified, that System-Id can be missed? SGML, which HTML 4.01 is an application of. Only in XML is the system identifier required, per: http://www.w3.org/TR/xml/#NT-ExternalID > > * http://www.validome.org/out/ena4023 > > Validome says valid. OpenSP and W3C Markup validator says not valid. > > I'd tend to trust opensp here. The comparison page's claim that > validome is the only validator doing the right thing is very dubious. > > > * http://www.validome.org/out/ena4024 > > Ditto above. The comparison page's claim that validome is the only > > validator doing the right thing is very dubious. > > What is here dubious? It's about SGML (not HTML) documents. And? > The "old" W3C-Validator made a fallback o US-ASCII, the "new" to > UTF-8. Can you explain this, please? > We asked many times W3C-Germany and Bjoern Hoehrmann in regard to > the *correct* behaviour of an validator in the case of a fallback, > but we didn't get any *exact* answer. In this case, the specs are > very unexact and ambiguous. Please give us a *mandatory" answer - > with a link reference to appropriate specifications - upon this > case. The only clear case till now is XHTML, there validators > should make a fallback to UTF-8 (depending on MIME-Type), HTML is > still ambiguous... There is no authoritative answer as far as I can tell, which supports my question: why do you consider your sending a fatal error the right thing to do, and other validators trying a fallback wrong? If there is no rule, you are not supposed to make arbitrary ones and claim you are the only ones to respect them. > > * http://www.validome.org/out/ena7003 > > I'd like to see a reference for this. > > http://www.w3.org/TR/html401/struct/links.html#h-12.2.3 > "...The id attribute, on the other hand, may not contain character > references." Interesting discrepancy between prose and DTD here, thanks for the pointer. > > * http://www.validome.org/out/ena7005 (and 7006) > > This has nothing to do with validation. If validome emulates some of > > the features of a link checker, compare it to link checkers, not > > validator. This test is moot. > > http://www.w3.org/TR/html401/struct/links.html#h-12.2.4 > "A reference to an unavailable or unidentifiable resource is an error" > ... > "If a user agent cannot locate a linked resource, it should alert > the user" > > Where is here the "moot"? The W3C-Specification is very clear in > this case... This is the usual confusion between user agent conformance (which the sections you quote are about) and document conformance (which validome and the markup validator are checking). > * http://www.validome.org/out/ena3002 > > This test is bogus. Sorry. An XML declaration also happens to be a > > proper SGML PI. Giving a warning asking the HTML4 author "are you > > sure you want this here" may be a good idea. Making this a fatal > > error is wrong, wrong, wrong. > > If a XML-declaration is allowed in SGML, I'd like to see a > reference for this. What I have is the SGML spec, chapter 8. Processing instructions. [on shorttags] > Oh, her we have hundred opinions of the case. Could you please show > us a *exact* reference? The best I have is the informative: http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.7 and the normative DTD, which allows the shorttags. As such, the spec clearly allows the construct, while informatively warning against it. I'll reply to your notes on distributing the markup validator in a separate mail. Thank you, -- olivier |
|
|
Re: validator catalogsHi Alex, On Oct 20, 2007, at 00:05 , Validome-Staff wrote: > 1. The W3C-SGML-Parser uses two catalog files: xml.soc and > sgml.soc. Within xml.soc there are 21 points missing, all regarding > SVG 1.1 Tiny and "SVG 1.1 Basic. The issues with SVG 1.1 Tiny and Basic are actualy a bit more complicated. See this mail: http://lists.w3.org/Archives/Public/www-svg/2007Oct/0005.html I think the workaround we found last month is better, see: http://lists.w3.org/Archives/Public/www-validator-cvs/2007Oct/0018.html I note that you also added a number of modules and files for XHTML print and basic, good idea. > 2. We missed 6 DTDs, necessary to get the download package running. Added, thanks. > 3. Your LibXML-Implementation was not correct - you just use the > catalog files of your SGML-Parser instead of taking care of the the > "official" catalog specification (http://www.xmlsoft.org/ > catalog.html#Simple). > Because of this, LibXML tries to get the external DTDs instead of > the local ones. Indeed, it was incorrect, but in the end we decided to not fix it, because loading of the catalogue is only supported after a certain version of XML::LibXML - hence we just didn't load anything and muted the entities errors. It may still be a good idea to fix it, although I'm not sure what version of XML::LibXML is supported by most systems. Thank you, -- olivier Thereaux - W3C - http://www.w3.org/People/olivier/ W3C Open Source Software: http://www.w3.org/Status |
|
|
Re: Notes on validome test suite / validators comparisonFrank, On Oct 20, 2007, at 23:22 , Frank Ellermann wrote: > When I try to validate <http://idn.icann.org/IDNwiki> at your site > I get the same incorrect "valid" result as with the W3C validator. > > For the W3C validator I know that it can't (yet) check URI syntax, > but it's disappointing that your validator also fails. Is than an > issue in the "XHTML 1.0 transitional" schema or in your code ? I'm curious as to why you so adamantly want to ban non-ascii IRIs from HTML? More on this later, but from what I am gathering from the experts, given the spirit of the specs (written before IDNs and IRIs) and the level of support for IDNs, barking at IRIs in href and src would be counterproductive for the internationalization of the web. -- olivier |
|
|
Re: validator catalogsAlex, all On Oct 24, 2007, at 15:04 , olivier Thereaux wrote: > I note that you also added a number of modules and files for XHTML > print and basic, good idea. FWIW, the file included in the RAR had a number of typos, that would break any validator using them. I committed to CVS a proper version, I suggest you use this if you are to package the w3c markup validator. Thanks again. -- olivier |
|
|
Re: Notes on validome test suite / validators comparisonolivier Thereaux wrote: > I'm curious as to why you so adamantly want to ban non-ascii IRIs > from HTML? Please tell me that you're joking. Native IRIs are nice where they are permitted. But on ICANN's Wiki using XHTML 1.0 they will cause havoc: Sooner or later mediawiki will be fixed to generate valid XHTML 1.0, translating native IRIs to equivalent URIs on the fly. After all that's REQUIRED for backwards compatibility in the numerous Wikis based on mediawiki. Users want that something happens when they click on a link, without upgrading their browser. And native IRIs are designed to have an equivalent URI-form. Sooner or later validators will be fixed to validate URIs, what with all those "URI exploits" we've seen in the last weeks for XP after the installation of IE7. And when validators do their job all users who naively followed ICANN and W3C into the realms of "who cares about validity if it works" will be seriously annoyed. I can still tell you the day when the W3C validator started to flag as invalid on a windows-1252 page. I was working on this page, it was stunning. > from what I am gathering from the experts, given the spirit of the > specs (written before IDNs and IRIs) and the level of support for > IDNs, barking at IRIs in href and src would be counterproductive > for the internationalization of the web. I'm curious which expert propagates to violate specifications. Want to know how long it took me to create an XHTML ersatz-DTD permitting IRIs everywhere ? 30 minutes. Check out http://hmdmhdfmhdjmzdtjmzdtzktdkztdjz.googlepages.com/IDN-IRI-test.html Frank |
|
|
|
|
|
Re: IRIs in href (Was: Notes on validome test suite / validators comparison)On Oct 25, 2007, at 03:39 , Frank Ellermann wrote: > Users want that something happens when they > click on a link, without upgrading their browser. And native IRIs > are designed to have an equivalent URI-form. Lack of support for IRIs in legacy user agents is an issue, understood. Now, if today the HTML 4.01 and XHTML 1.0 specs and above were updated to say "IRIs" instead of "URIs", what would you do? As I wrote before, these specs were written before IRIs were a reality. The HTML4 spec contains advice on how to treat "URIs containing non- ASCII characters". See http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1 Although it clearly calls these illegal, it prepares the ground for IRIs (for which we didn't yet have that name at that time). Saying that IRIs should not be used because they break in legacy software, is an argument I have sympathy for, but have trouble accepting. This reminds me of the situation whereby, in Japan, one still can't safely use unicode in mails, because so many MUAs or webmails just don't support it. > Sooner or later validators will be fixed to validate URIs, what with > all those "URI exploits" we've seen in the last weeks for XP after > the installation of IE7. This is irrelevant to the discussion about IRIs. Please don't use internationalization as a scapegoat for bad coding. > I can still tell you the day when the W3C validator started to flag > as invalid on a windows-1252 page. I was working on this > page, it was stunning. There once was a bug, and IIRC it was fixed in a few hours. Now, how is that relevant to the discussion at hand? > I'm curious which expert propagates to violate specifications. Want > to know how long it took me to create an XHTML ersatz-DTD permitting > IRIs everywhere ? 30 minutes. Here you must be joking, bluffing, or mistaken, Frank. The current XHTML DTD says that DTDs are CDATA, and thus any SGML or XML validator has to accept all the characters allowed in the document, which includes all those usable in IRIs. -- olivier |
|
|
Re: IRIs in href (Was: Notes on validome test suite / validators comparison)olivier Thereaux wrote: > > > On Oct 25, 2007, at 03:39 , Frank Ellermann wrote: >> Users want that something happens when they >> click on a link, without upgrading their browser. And native IRIs >> are designed to have an equivalent URI-form. > > Lack of support for IRIs in legacy user agents is an issue, understood. > Now, if today the HTML 4.01 and XHTML 1.0 specs and above were updated > to say "IRIs" instead of "URIs", what would you do? > > As I wrote before, these specs were written before IRIs were a reality. > The HTML4 spec contains advice on how to treat "URIs containing > non-ASCII characters". > See http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1 > Although it clearly calls these illegal, it prepares the ground for IRIs > (for which we didn't yet have that name at that time). > > Saying that IRIs should not be used because they break in legacy > software, is an argument I have sympathy for, but have trouble > accepting. This reminds me of the situation whereby, in Japan, one still > can't safely use unicode in mails, because so many MUAs or webmails just > don't support it. What about saying that IRIs should not be used because RFC 3987 section 1.2 item (a) says that this standard is not intended to apply to any protocol or format element unless those formats or protocols explicitly say that IRIs are supported? - Sam Ruby |
|
|
Re: IRIs in hrefolivier Thereaux wrote: > Now, if today the HTML 4.01 and XHTML 1.0 specs and above were > updated to say "IRIs" instead of "URIs", what would you do? Maybe ditch the W3C and post the reasons in an Internet Draft. I'd certainly consider it as unethical. RFC 3987 does not "update" 3986. The spec.s should be updated with s/2396/3986/g, s/3066/4646/g, and similar clerical tasks, e.g. explaining why xml:lang is forced to be still an NMTOKEN wrt these document types. But for incompatible modifications we need new document types. Not worldwide "upgrade your browser" campaigns, some users can't, and besides it's completely unnecessary, all IRIs by definition have an equivalent URI working with "any browser". > Saying that IRIs should not be used because they break in > legacy software, is an argument I have sympathy for, but > have trouble accepting. < |