Suggested patches for resolver: Windows driver-letter paths and resolveSystem() and <uri>

View: New views
6 Messages — Rating Filter:   Alert me  

Suggested patches for resolver: Windows driver-letter paths and resolveSystem() and <uri>

by Earl Hood :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

While working with xml-commons-resolver, I discovered that the code
does not handle pathnames that utilize window's driver letters.  The code
appears to lose the "absoluteness" of the path, causing resolution of
other entities/files to fail that have it for a base.

Also, and probably a little more controversal, is the resolution of system IDs.
I noticed that resolverSystem() does not do a resolveURI() if no system
mapping exists.  This is a problem when using the <schemavalidate>
task in Ant.  I'm working with catalogs that contain numerous <uri> entries
to remap http URLs to local file URLs.

Unfortunately, Ant/Xerces fails to resolve to the local URLs because
resolveSytem() is used (because the URLs appears in SYSTEM idenitifiers
in the documents).

The XML resolution spec is not clear if the resolver should also check <uri>
entries or if the XML parser should do a URI lookup if a SYSTEM lookup
fails (it appears Saxon may actually do this since it does not have the problem
that Ant/Xerces does).

To address the immediate problem, I checked resolveSystem() to call
resolveURI() if it fails to find anything, and the change is in the
Catalog.java.patch
attached (but the patch also includes the Windows pathname fix also).

Are any of these changes worth including in the resolver code base?

--ewh



Catalog.java.patch (3K) Download Attachment
FileURL.java.patch (952 bytes) Download Attachment

Re: Suggested patches for resolver: Windows driver-letter paths and resolveSystem() and <uri>

by Michael Glavassevich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Earl,

The change you're suggesting to resolveSystem() would break compatibility.
It also doesn't fit with the semantics of the method though I'm sure you
knew that before you suggested it. Any user of the resolver already has the
power to call resolveURI() after resolveSystem() if they choose to or make
any other sequence of calls they want on the Catalog. Have you considered
asking the Ant developers to modify the behaviour of the <schemavalidate>
task or provide some way to tune it?

As for your Windows drive letter patch you should attach that to a Bugzilla
issue [1]. A warning though ... No one has been maintaining the codebase
these days so can't say when that would get reviewed or committed. Probably
going to take a developer with an itch to scratch to get things moving
again.

Thanks.

[1] https://issues.apache.org/bugzilla/

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@...
E-mail: mrglavas@...

earlhood@... wrote on 04/14/2008 07:00:01 PM:

> While working with xml-commons-resolver, I discovered that the code
> does not handle pathnames that utilize window's driver letters.  The code
> appears to lose the "absoluteness" of the path, causing resolution of
> other entities/files to fail that have it for a base.
>
> Also, and probably a little more controversal, is the resolution of
> system IDs.
> I noticed that resolverSystem() does not do a resolveURI() if no system
> mapping exists.  This is a problem when using the <schemavalidate>
> task in Ant.  I'm working with catalogs that contain numerous <uri>
entries
> to remap http URLs to local file URLs.
>
> Unfortunately, Ant/Xerces fails to resolve to the local URLs because
> resolveSytem() is used (because the URLs appears in SYSTEM idenitifiers
> in the documents).
>
> The XML resolution spec is not clear if the resolver should also check
<uri>

> entries or if the XML parser should do a URI lookup if a SYSTEM lookup
> fails (it appears Saxon may actually do this since it does not have
> the problem
> that Ant/Xerces does).
>
> To address the immediate problem, I checked resolveSystem() to call
> resolveURI() if it fails to find anything, and the change is in the
> Catalog.java.patch
> attached (but the patch also includes the Windows pathname fix also).
>
> Are any of these changes worth including in the resolver code base?
>
> --ewh


Re: Suggested patches for resolver: Windows driver-letter paths and resolveSystem() and <uri>

by Earl Hood :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On April 17, 2008 at 00:49, Michael Glavassevich wrote:

> The change you're suggesting to resolveSystem() would break compatibility.
> It also doesn't fit with the semantics of the method though I'm sure you
> knew that before you suggested it. Any user of the resolver already has the
> power to call resolveURI() after resolveSystem() if they choose to or make
> any other sequence of calls they want on the Catalog. Have you considered
> asking the Ant developers to modify the behaviour of the <schemavalidate>
> task or provide some way to tune it?

I think it may be Xerces since that is what Ant uses by default.
When I get time, I can examine the Xerces code to see if I can
provide a patch for it.

A concern I have about the resolving algorithm, as it is described
in the W3C doc, is that it appears to lack how <uri> entries are to
be handled.  It seems to properly resolve something, a <uri> entry
check should always be done, probably at the end of resolving
a public ID, system ID, or entity.

If the resolver code does not do this, at least the entity manager
(which in essences is a "resolver") should.  If it is something
that all entity managers should do, why not encapsulate it in the
resolver?

> As for your Windows drive letter patch you should attach that to a Bugzilla
> issue [1]. A warning though ... No one has been maintaining the codebase
> these days so can't say when that would get reviewed or committed. Probably
> going to take a developer with an itch to scratch to get things moving
> again.
...
> [1] https://issues.apache.org/bugzilla/

Thanks for the pointer.

--ewh
--
Earl Hood, <earl@...>
Web: <http://www.earlhood.com/>
PGP Public Key: <http://www.earlhood.com/gpgpubkey.txt>

Re: Suggested patches for resolver: Windows driver-letter paths and resolveSystem() and <uri>

by Michael Glavassevich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Earl Hood <earl@...> wrote on 04/17/2008 10:32:43 AM:

> On April 17, 2008 at 00:49, Michael Glavassevich wrote:
>
> > The change you're suggesting to resolveSystem() would break
compatibility.
> > It also doesn't fit with the semantics of the method though I'm sure
you
> > knew that before you suggested it. Any user of the resolver already has
the
> > power to call resolveURI() after resolveSystem() if they choose to or
make
> > any other sequence of calls they want on the Catalog. Have you
considered
> > asking the Ant developers to modify the behaviour of the
<schemavalidate>
> > task or provide some way to tune it?
>
> I think it may be Xerces since that is what Ant uses by default.
> When I get time, I can examine the Xerces code to see if I can
> provide a patch for it.

It is Ant or whatever application which uses the parser that is in control
of resource resolution. EntityResolver, LSResourceResolver and its friends
are just interfaces. Xerces calls whatever implementation that's been
registered with it. It's the application's responsibility to choose or
write an implementation which does what it needs.

> A concern I have about the resolving algorithm, as it is described
> in the W3C doc, is that it appears to lack how <uri> entries are to
> be handled.  It seems to properly resolve something, a <uri> entry
> check should always be done, probably at the end of resolving
> a public ID, system ID, or entity.
>
> If the resolver code does not do this, at least the entity manager
> (which in essences is a "resolver") should.  If it is something
> that all entity managers should do, why not encapsulate it in the
> resolver?
>
> > As for your Windows drive letter patch you should attach that to a
Bugzilla
> > issue [1]. A warning though ... No one has been maintaining the
codebase
> > these days so can't say when that would get reviewed or committed.
Probably

> > going to take a developer with an itch to scratch to get things moving
> > again.
> ...
> > [1] https://issues.apache.org/bugzilla/
>
> Thanks for the pointer.
>
> --ewh
> --
> Earl Hood, <earl@...>
> Web: <http://www.earlhood.com/>
> PGP Public Key: <http://www.earlhood.com/gpgpubkey.txt>

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@...
E-mail: mrglavas@...


Re: Suggested patches for resolver: Windows driver-letter paths and resolveSystem() and <uri>

by David Crossley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Earl Hood wrote:
> While working with xml-commons-resolver, I discovered that the code
> does not handle pathnames that utilize window's driver letters.  The code
> appears to lose the "absoluteness" of the path, causing resolution of
> other entities/files to fail that have it for a base.

Earl, i don't know if this is related, but at Apache Forrest we
have troubles in a certain situation. The newest Resolver works
fine on Windows when we use it with Xerces via Apache Cocoon,
but fails when we use it via Apache Ant.

Last time i looked, none of our Windows developers have yet
found time to take it up with the Ant project.

Here is my reply to Norman Walsh on this commons-dev list:
http://markmail.org/message/4vozuvf2gwwuk33k
Re: File URLs will be the death of me
Date: 03 Jul 2006
which also links to an issue report.

When i investigated that issue i found that Ant had some
strange code for Windows path handling in the "xmlcatalog" task.
It seemed to be a workaround for problems with Resolver.
After Norm fixed it here at Apache XML Commons, perhaps the
workaround in Ant now fails.

Dunno.

-David

Re: Suggested patches for resolver: Windows driver-letter paths and resolveSystem() and <uri>

by Earl Hood :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On April 17, 2008 at 00:49, Michael Glavassevich wrote:

> The change you're suggesting to resolveSystem() would break compatibility.
> It also doesn't fit with the semantics of the method though I'm sure you
> knew that before you suggested it. Any user of the resolver already has the
> power to call resolveURI() after resolveSystem() if they choose to or make
> any other sequence of calls they want on the Catalog. Have you considered
> asking the Ant developers to modify the behaviour of the <schemavalidate>
> task or provide some way to tune it?

Maybe the patch should go into the classes in the tools area
versus Catalog itself.  I.e.  The classes that actually implement
the EntityResolver and URIResolver interfaces should by updated
to do a URI-map lookup.

This way, Catalog semantics are preserved, but the resolving
classes are fixed.

Thoughts?

--ewh
LightInTheBox - Buy quality products at wholesale price