utf-8 characters in XML encoded HL7 messages

View: New views
6 Messages — Rating Filter:   Alert me  

utf-8 characters in XML encoded HL7 messages

by jimkski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey All-

I'm having an issue with XMLSerializer's handling of character encodings.  
I'm sending antibiotic sensitivity results via OBX segments in ORU^R01
messages.  The OBX needs to communicate the units used in the measurement
being reported.  In one case, we need to use the character encoding known as
the micro sign.  Hopefully, the email will render this correctly, but the
following xml snippet shows what I'd like to send, which is the HTML
character encoding for the micro sign:


<OBX.6>
    <CE.1>µ/mL</CE.1>
</OBX.6>

Unfortunately, once the XMLSerializer is done with it I end up with this
(again if the characters are mangled, the example shows the ampersand for
the micro sign transformed into an HTML character encoding for ampersand
followed by the text string "micro;":

<OBX.6>
    <CE.1>&micro;/mL</CE.1>
</OBX.6>

If you look at the source code for XMLSerializer you'll see that it doesn't
recognize or respect HTML encoding characters when it sees them.  As soon as
it sees an ampersand, regardless of the context, it transforms it into its
HTML character encoding equvalent.

If I try simply sending through the character or its unicode equivalent the
serializer happily sends it  through untouched, but the system I'm sending
to isn't able to cope with it and rejects the message.  It will only accept
the html character encoding.

Does anyone have a work around for this that doesn't involve post processing
the XML representation of the message with some kind of search and replace?

Thanks.




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Hl7api-devel mailing list
Hl7api-devel@...
https://lists.sourceforge.net/lists/listinfo/hl7api-devel

Parent Message unknown Re: utf-8 characters in XML encoded HL7 messages

by nicovn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jim,
 
I'm affraid the xml parser used in HAPI doesn't support character entities.
 
But for an xml document I don't see an immediate reason to use character entities, you can just specifiy the character in your xml document since it's UTF-8 encoded.
 
If my explanation is not clear, please send me your xml file ... I'll update it ...
 
Thanks
 
Regards
 
Nico





Hey All-I'm having an issue with XMLSerializer's handling of character encodings. I'm sending antibiotic sensitivity results via OBX segments in ORU^R01 messages. The OBX needs to communicate the units used in the measurement being reported. In one case, we need to use the character encoding known as the micro sign. Hopefully, the email will render this correctly, but the following xml snippet shows what I'd like to send, which is the HTML character encoding for the micro sign: µ/mLUnfortunately, once the XMLSerializer is done with it I end up with this (again if the characters are mangled, the example shows the ampersand for the micro sign transformed into an HTML character encoding for ampersand followed by the text string "micro;": µ/mLIf you look at the source code for XMLSerializer you'll see that it doesn't recognize or respect HTML encoding characters when it sees them. As soon as it sees an ampersand, regardless! of the context, it transforms it into its HTML character encoding equvalent.If I try simply sending through the character or its unicode equivalent the serializer happily sends it through untouched, but the system I'm sending to isn't able to cope with it and rejects the message. It will only accept the html character encoding.Does anyone have a work around for this that doesn't involve post processing the XML representation of the message with some kind of search and replace?Thanks.-------------------------------------------------------------------------Using Tomcat but need to do more? Need to support webb services, security?Get stuff done quickly with pre-integrated technology to make your job easierDownload IBM WebSphere Application Server v.1.0.1 based on Apache Geronimohttp://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642_______________________________________________Hl7api-devel mailing listHl7api-devel@...://lists.sourceforge.net/lists/listinfo/hl7api-devel




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

_______________________________________________
Hl7api-devel mailing list
Hl7api-devel@...
https://lists.sourceforge.net/lists/listinfo/hl7api-devel

Re: utf-8 characters in XML encoded HL7 messages

by jimkski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Nico-

Thanks for the response.  You're suggesting that I make sure to include the
encoding attribute in the XML prolog and set it to utf-8, right?:

<?xml version="1.0" encoding="UTF-8"?>

Unfortunately the vendor we're working with won't accept an XML file unless
it can be opened in IE (which boggles my mind since the system is written in
Java and delployed on web logic).

If you create an xml file, set it up with a proper prolog and then try
embedding the µ character in it (aka java character \u00B5) you'll find that
IE won't open it no matter what.

At this point, it really boils down to an issue with the system we're
messaging to and not HAPI.


On another topic, how's progress towards HAPI v0.5 coming along?




>From: nicovn@...
>To: jimkski@..., hl7api-devel@...
>Subject: RE:  [HAPI-devel] utf-8 characters in XML encoded HL7 messages
>Date: Fri, 14 Jul 2006 12:57:18 +0200
>
>Hi Jim,
>
>I'm affraid the xml parser used in HAPI doesn't support character entities.
>
>But for an xml document I don't see an immediate reason to use character
>entities, you can just specifiy the character in your xml document since
>it's UTF-8 encoded.
>
>If my explanation is not clear, please send me your xml file ... I'll
>update it ...
>
>Thanks
>
>Regards
>
>Nico
>
>
>
>
>
>
>Hey All-I'm having an issue with XMLSerializer's handling of character
>encodings. I'm sending antibiotic sensitivity results via OBX segments in
>ORU^R01 messages. The OBX needs to communicate the units used in the
>measurement being reported. In one case, we need to use the character
>encoding known as the micro sign. Hopefully, the email will render this
>correctly, but the following xml snippet shows what I'd like to send, which
>is the HTML character encoding for the micro sign: µ/mLUnfortunately, once
>the XMLSerializer is done with it I end up with this (again if the
>characters are mangled, the example shows the ampersand for the micro sign
>transformed into an HTML character encoding for ampersand followed by the
>text string \"micro;\": µ/mLIf you look at the source code for
>XMLSerializer you'll see that it doesn't recognize or respect HTML encoding
>characters when it sees them. As soon as it sees an ampersand, regardless
>of the context, it transforms it into its HTML charac!
>  ter
>encoding equvalent.If I try simply sending through the character or its
>unicode equivalent the serializer happily sends it through untouched, but
>the system I'm sending to isn't able to cope with it and rejects the
>message. It will only accept the html character encoding.Does anyone have a
>work around for this that doesn't involve post processing the XML
>representation of the message with some kind of search and
>replace?Thanks.-------------------------------------------------------------------------Using
>Tomcat but need to do more? Need to support web services, security?Get
>stuff done quickly with pre-integrated technology to make your job
>easierDownload IBM WebSphere Application Server v.1.0.1 based on Apache
>Geronimohttp://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642_______________________________________________Hl7api-devel
>mailing
>listHl7api-devel@...://lists.sourceforge.net/lists/listinfo/hl7api-devel



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Hl7api-devel mailing list
Hl7api-devel@...
https://lists.sourceforge.net/lists/listinfo/hl7api-devel

Parent Message unknown Re: utf-8 characters in XML encoded HL7 messages

by jimkski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Nico-

I've got it sorted out now.  The Java spec says the default encoding is set
based on the preferences of the underlying operating system at runtime.  I'm
running on Windows so mine wasn't using utf-8.  Doing an explicit encode to
utf-8 before writing the message string into the file was the fix.

Thanks for the help.

Jim

>From: nicovn@...
>To: jimkski@..., hl7api-devel@...
>Subject: RE:  Re: [HAPI-devel] utf-8 characters in XML encoded HL7 messages
>Date: Tue, 18 Jul 2006 12:46:59 +0200
>
>Hi Jim,
>
>It seems to me that something is wrong with your xml message/file.
>
>They have a point that they only want to accept an xml file that can be
>opened in Internet Explorer. If the xml message is properly encoded it
>should open without a problem in IE.
>
>You don't have to include the \"utf-8\" encoding tag in the document
>header, since UTF-8 is the default encoding.
>
>I guess that the problem is that your file isn't UTF-8 encoded.
>
>If you're running Windows 2000/XP you can easily verify this by opening the
>file in Notepad and use the \"Save As\" menu option and select \"UTF-8\" in
>the encoding combo-box. After saving you should be able to open the xml
>file in Internet Explorer.
>
>Please find attached an example file containing the character that you
>struggle with.
>
>If you don't find the reason why your message can't be opened in IE, please
>send me the file.
>
>Best Regards
>
>Nico
>
>
>
>
>
>
>Hi Nico-Thanks for the response. You're suggesting that I make sure to
>include the encoding attribute in the XML prolog and set it to utf-8,
>right?:Unfortunately the vendor we're working with won't accept an XML file
>unless it can be opened in IE (which boggles my mind since the system is
>written in Java and delployed on web logic).If you create an xml file, set
>it up with a proper prolog and then try embedding the µ character in it
>(aka java character \\\\u00B5) you'll find that IE won't open it no matter
>what.At this point, it really boils down to an issue with the system we're
>messaging to and not HAPI.On another topic, how's progress towards HAPI
>v0.5 coming along?>From: nicovn@...>To: jimkski@...,
>hl7api-devel@...>Subject: RE: [HAPI-devel] utf-8
>characters in XML encoded HL7 messages>Date: Fri, 14 Jul 2006 12:57:18
>+0200>>Hi Jim,>>I'm affraid the xml parser used in HAPI doesn't support
>character entities.>>But for an xml document I don't !
>  see an
>immediate reason to use character >entities, you can just specifiy the
>character in your xml document since >it's UTF-8 encoded.>>If my
>explanation is not clear, please send me your xml file ... I'll >update it
>...>>Thanks>>Regards>>Nico>>>>>>>Hey All-I'm having an issue with
>XMLSerializer's handling of character >encodings. I'm sending antibiotic
>sensitivity results via OBX segments in >ORU^R01 messages. The OBX needs to
>communicate the units used in the >measurement being reported. In one case,
>we need to use the character >encoding known as the micro sign. Hopefully,
>the email will render this >correctly, but the following xml snippet shows
>what I'd like to send, which >is the HTML character encoding for the micro
>sign: µ/mLUnfortunately, once >the XMLSerializer is done with it I end up
>with this (again if the >characters are mangled, the example shows the
>ampersand for the micro sign >transformed into an HTML character encoding
>for ampersand followed by the >text string
>\"micro;\": µ/mLIf you look at the source code for >XMLSerializer you'll
>see that it doesn't recognize or respect HTML encoding >characters when it
>sees them. As soon as it sees an ampersand, regardless >of the context, it
>transforms it into its HTML charac!> ter>encoding equvalent.If I try simply
>sending through the character or its >unicode equivalent the serializer
>happily sends it through untouched, but >the system I'm sending to isn't
>able to cope with it and rejects the >message. It will only accept the html
>character encoding.Does anyone have a >work around for this that doesn't
>involve post processing the XML >representation of the message with some
>kind of search and
> >replace?Thanks.-------------------------------------------------------------------------Using
> >Tomcat but need to do more? Need to support web services, security?Get
> >stuff done quickly with pre-integrated technology to make your job
> >easierDownload IBM WebSphere Application Server v.1.0.1 based on Apa!
>  che
> >Geronimohttp://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642_______________________________________________Hl7api-devel
> >mailing
> >listHl7api-devel@...://lists.sourceforge.net/lists/listinfo/hl7api-devel--

><< adt_a01.zip >>




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Hl7api-devel mailing list
Hl7api-devel@...
https://lists.sourceforge.net/lists/listinfo/hl7api-devel

Re: utf-8 characters in XML encoded HL7 messages

by bnagakishore :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi All,
  I am not able to add the encoding="UTF-8" in this tag <?xml version="1.0"?> using HAPI. And also when the Japanese characters are included in the HL7 xml message, they are displayed as '????'.
  Can any one who has overcome this problem, provide me the solution.

Thanks&Regards,
NagaKishore.

jimkski wrote:
Hey All-

I'm having an issue with XMLSerializer's handling of character encodings.  
I'm sending antibiotic sensitivity results via OBX segments in ORU^R01
messages.  The OBX needs to communicate the units used in the measurement
being reported.  In one case, we need to use the character encoding known as
the micro sign.  Hopefully, the email will render this correctly, but the
following xml snippet shows what I'd like to send, which is the HTML
character encoding for the micro sign:


<OBX.6>
    <CE.1>µ/mL</CE.1>
</OBX.6>

Unfortunately, once the XMLSerializer is done with it I end up with this
(again if the characters are mangled, the example shows the ampersand for
the micro sign transformed into an HTML character encoding for ampersand
followed by the text string "micro;":

<OBX.6>
    <CE.1>&micro;/mL</CE.1>
</OBX.6>

If you look at the source code for XMLSerializer you'll see that it doesn't
recognize or respect HTML encoding characters when it sees them.  As soon as
it sees an ampersand, regardless of the context, it transforms it into its
HTML character encoding equvalent.

If I try simply sending through the character or its unicode equivalent the
serializer happily sends it  through untouched, but the system I'm sending
to isn't able to cope with it and rejects the message.  It will only accept
the html character encoding.

Does anyone have a work around for this that doesn't involve post processing
the XML representation of the message with some kind of search and replace?

Thanks.




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Hl7api-devel mailing list
Hl7api-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hl7api-devel

Re: utf-8 characters in XML encoded HL7 messages

by bnagakishore :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi All,
  I am not able to add the encoding="UTF-8" to the prolog <?xml version="1.0"?> using HAPI. And also when the Japanese characters are included in the HL7 xml message, they are displayed as '????'.
  Can any one who has overcome this problem, provide me the solution.

Regards,
Naga Kishore.


jimkski wrote:
Hey All-

I'm having an issue with XMLSerializer's handling of character encodings.  
I'm sending antibiotic sensitivity results via OBX segments in ORU^R01
messages.  The OBX needs to communicate the units used in the measurement
being reported.  In one case, we need to use the character encoding known as
the micro sign.  Hopefully, the email will render this correctly, but the
following xml snippet shows what I'd like to send, which is the HTML
character encoding for the micro sign:


<OBX.6>
    <CE.1>µ/mL</CE.1>
</OBX.6>

Unfortunately, once the XMLSerializer is done with it I end up with this
(again if the characters are mangled, the example shows the ampersand for
the micro sign transformed into an HTML character encoding for ampersand
followed by the text string "micro;":

<OBX.6>
    <CE.1>&micro;/mL</CE.1>
</OBX.6>

If you look at the source code for XMLSerializer you'll see that it doesn't
recognize or respect HTML encoding characters when it sees them.  As soon as
it sees an ampersand, regardless of the context, it transforms it into its
HTML character encoding equvalent.

If I try simply sending through the character or its unicode equivalent the
serializer happily sends it  through untouched, but the system I'm sending
to isn't able to cope with it and rejects the message.  It will only accept
the html character encoding.

Does anyone have a work around for this that doesn't involve post processing
the XML representation of the message with some kind of search and replace?

Thanks.




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Hl7api-devel mailing list
Hl7api-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hl7api-devel