Possible bug when parsing an XML document when JVM is using Turkish locale

View: New views
3 Messages — Rating Filter:   Alert me  

Possible bug when parsing an XML document when JVM is using Turkish locale

by Ali Seaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I'm new to the list I just thought this issue should be raised. I was receiving an error from Tomcat when is was parsing an XML document.

The XML document was using a standard latin encoding 8859-1 declared in this way:

<?xml version="1.0" encoding="iso-8859-1"?>

Error received:

org.xml.sax.SAXParseException: Invalid encoding name "iso-8859-1".

i knew iso-8859-1 was not an invalid encoding so I checked the code to see what was going on. Inside the method org.apache.xerces.impl.XMLEntityManager.createReader an upper case representation of the encoding is created with a toUpperCase(). In a turkish locale a small 'i' becomes a Turkish 'I' with a dot on it hence subsequent checking of the encoding against the pre-defined valid lists fails.

My suggestion would be that the toUpperCase should be called with the overload that allows the specification of a english locale and hence creating the correct 'I'

Thanks

Alistair

Re: Possible bug when parsing an XML document when JVM is using Turkish locale

by Michael Glavassevich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Alistair,

You must be using an ancient version of Xerces. This particular problem was fixed way back in 2002.

Try using the latest release (2.9.1) available here: http://xerces.apache.org/xerces2-j/download.cgi.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@...

E-mail: mrglavas@...

"Ali Seaton" <ali.seaton@...> wrote on 06/24/2008 12:33:55 PM:

> Hi,
>
> I'm new to the list I just thought this issue should be raised. I
> was receiving an error from Tomcat when is was parsing an XML document.
>
> The XML document was using a standard latin encoding 8859-1 declared
> in this way:
>
> <?xml version="1.0" encoding="iso-8859-1"?>
>
> Error received:
>
> org.xml.sax.SAXParseException: Invalid encoding name "iso-8859-1".
>
> i knew iso-8859-1 was not an invalid encoding so I checked the code
> to see what was going on. Inside the method org.apache.xerces.impl.
> XMLEntityManager.createReader an upper case representation of the
> encoding is created with a toUpperCase(). In a turkish locale a
> small 'i' becomes a Turkish 'I' with a dot on it hence subsequent
> checking of the encoding against the pre-defined valid lists fails.
>
> My suggestion would be that the toUpperCase should be called with
> the overload that allows the specification of a english locale and
> hence creating the correct 'I'
>
> Thanks
>
> Alistair


Re: Possible bug when parsing an XML document when JVM is using Turkish locale

by Ali Seaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Michael,

Thanks for the quick reply, I had downloaded the code for version 2.0.2 as this is what was embedded within the application I was debugging. When I searched on google I didn't find any reference to this problem so I never though to check the latest version for the fix - doh!

Thanks all the same

Alistair

2008/6/24 Michael Glavassevich <mrglavas@...>:

Hi Alistair,

You must be using an ancient version of Xerces. This particular problem was fixed way back in 2002.

Try using the latest release (2.9.1) available here: http://xerces.apache.org/xerces2-j/download.cgi.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@...

E-mail: mrglavas@...

"Ali Seaton" <ali.seaton@...> wrote on 06/24/2008 12:33:55 PM:



> Hi,
>
> I'm new to the list I just thought this issue should be raised. I
> was receiving an error from Tomcat when is was parsing an XML document.
>
> The XML document was using a standard latin encoding 8859-1 declared
> in this way:
>
> <?xml version="1.0" encoding="iso-8859-1"?>
>
> Error received:
>
> org.xml.sax.SAXParseException: Invalid encoding name "iso-8859-1".
>
> i knew iso-8859-1 was not an invalid encoding so I checked the code
> to see what was going on. Inside the method org.apache.xerces.impl.
> XMLEntityManager.createReader an upper case representation of the
> encoding is created with a toUpperCase(). In a turkish locale a
> small 'i' becomes a Turkish 'I' with a dot on it hence subsequent
> checking of the encoding against the pre-defined valid lists fails.
>
> My suggestion would be that the toUpperCase should be called with
> the overload that allows the specification of a english locale and
> hence creating the correct 'I'
>
> Thanks
>
> Alistair


LightInTheBox - Buy quality products at wholesale price