URL decoding of plus (aka space) isn't done

View: New views
2 Messages — Rating Filter:   Alert me  

URL decoding of plus (aka space) isn't done

by Doug Donohoe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm tracking down a bug in Wicket that appears in Tomcat (also Jetty).  The issue is this:

I'm submitting a URL with spaces and pluses in it.  It is URL encoded (using java.net.URLEncoder) as follows:

http://localhost:8080/bugs/home/message/message+with+spaces+and%2Bsome%2Bpluses/

On both Tomcat and Jetty, the request.getServletPath() call returns this:

/home/message/message+with+spaces+and+some+plusses=bug/

Only the %2B (+) encodings are URL-decoded.  The + (space) encodings are left as-is.  As you can see, through this process some information is lost (what the original pluses were).

This happens in Tomcat in CoyoteAdapter:393:

req.getURLDecoder().convert(decodedURI, false);

The 'false' is for the "query" param, which when false, skips converting of +

Jetty's code is similar - it just ignores '+' altogether.

My question is this.  If it is correct to URL encode spaces with +, which doesn't Tomcat (or Jetty) decode them?  Or is the java.net.URLEncoder doing this incorrectly?  Are + only allowed in the query string portion?   There seems to be an impedance mismatch between what a client will allow in a URL versus what the server is expecting.  Anyone have any insight into this?

Thanks,

-Doug

Re: URL decoding of plus (aka space) isn't done

by Doug Donohoe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

To answer my own question:

I looked at the RFC (http://www.ietf.org/rfc/rfc2396.txt) and it appears that '+' is only reserved for query strings.  The use of java.net.URLEncoder for path-portions of a URL (or even an entire URL) is incorrect.

-Doug

Doug Donohoe wrote:
I'm tracking down a bug in Wicket that appears in Tomcat (also Jetty).  The issue is this:

I'm submitting a URL with spaces and pluses in it.  It is URL encoded (using java.net.URLEncoder) as follows:

http://localhost:8080/bugs/home/message/message+with+spaces+and%2Bsome%2Bpluses/

On both Tomcat and Jetty, the request.getServletPath() call returns this:

/home/message/message+with+spaces+and+some+plusses=bug/

Only the %2B (+) encodings are URL-decoded.  The + (space) encodings are left as-is.  As you can see, through this process some information is lost (what the original pluses were).

This happens in Tomcat in CoyoteAdapter:393:

req.getURLDecoder().convert(decodedURI, false);

The 'false' is for the "query" param, which when false, skips converting of +

Jetty's code is similar - it just ignores '+' altogether.

My question is this.  If it is correct to URL encode spaces with +, which doesn't Tomcat (or Jetty) decode them?  Or is the java.net.URLEncoder doing this incorrectly?  Are + only allowed in the query string portion?   There seems to be an impedance mismatch between what a client will allow in a URL versus what the server is expecting.  Anyone have any insight into this?

Thanks,

-Doug