|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
Preserving the doctype and entity referencesI thought I'd have a go at this as it comes up quite often, the goal
being that people can use it from the command line without needed to write any Java. I've written an XMLReader replacement that generates PIs for the doctype and entities. It works as expected from Java, but not from the command line using the -x switch. The test XML is: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>lexical example</title> </head> <body> <p>hello world</p> </body> </html> The transform is: <xsl:template match="/"> PIs <xsl:value-of select="count(//processing-instruction())"/> </xsl:template> The class itself is below. The output when run from Java is "PIs 3" so it's showing some pi's in the output. The code running the transform is: XMLReader customXMLReader = new CustomXMLReader(); SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance(); TransformerHandler handler = stf.newTransformerHandler(new StreamSource("C:\\users\\andrew\\documents\\test.xsl")); xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler", new CustomLexicalHandler(handler)); handler.setResult(new StreamResult(System.out)); customXMLReader.setContentHandler(handler); customXMLReader.parse("C:\\users\\andrew\\documents\\test.xml"); >From the command line I'm using: java -cp blah/CustomXMLReader.jar;blah\saxon9.jar net.sf.saxon.Transform -x:com.andrewjwelch.CustomXMLReader test.xml test.xsl and getting the output "PIs 0".... The class is below. If I litter it with System.out's I can see that it is used, that parse(String systemId()) is called, but none of the lexical methods. Any ideas? package com.andrewjwelch; import java.io.IOException; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.XMLFilterImpl; import org.xml.sax.helpers.XMLReaderFactory; public class CustomXMLReader extends XMLFilterImpl implements LexicalHandler { private boolean isProcessingDTD; private XMLReader xmlReader; public CustomXMLReader() throws Exception { super(); xmlReader = XMLReaderFactory.createXMLReader(); xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler", this); super.setParent(xmlReader); } @Override public void parse(InputSource input) throws SAXException, IOException { super.parse(input); } @Override public void parse(String systemId) throws SAXException, IOException { super.parse(systemId); } public void startDTD(String name, String publicId, String systemId) throws SAXException { super.processingInstruction("doctype-public", publicId); super.processingInstruction("doctype-system", systemId); isProcessingDTD = true; } public void endDTD() throws SAXException { isProcessingDTD = false; } public void startEntity(String name) throws SAXException { if (!isProcessingDTD) { super.processingInstruction("entity", name); } } public void endEntity(String name) throws SAXException { } public void startCDATA() throws SAXException { } public void endCDATA() throws SAXException { } public void comment(char[] ch, int start, int length) throws SAXException { } } -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity referencesSaxon nominates itself as the lexical handler by calling
parser.setProperty("...lexical-handler", ce) (see Sender line 378). "parser" here is your CustomXmlReader; which doesn't implement setProperty, so the base class does parent.setProperty(), causing the lexical events to be sent straight from Xerces to Saxon's ReceivingContentHandler (which ignores most of them) rather than to your filter. You simply need to implement setProperty() to intercept this call. If you want to do things properly you should pass all the lexical events on to Saxon after dealing with them yourself. Saxon needs to know about comments, and it needs to know about the start and end of the DTD so that it can ignore comments and PIs occurring therein. It also likes to be told about unparsed entities. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: saxon-help-bounces@... > [mailto:saxon-help-bounces@...] On Behalf > Of Andrew Welch > Sent: 18 July 2008 12:50 > To: Mailing list for the SAXON XSLT and XQuery processor > Subject: [saxon] Preserving the doctype and entity references > > I thought I'd have a go at this as it comes up quite often, > the goal being that people can use it from the command line > without needed to write any Java. > > I've written an XMLReader replacement that generates PIs for > the doctype and entities. It works as expected from Java, > but not from the command line using the -x switch. > > The test XML is: > > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml"> > <head> > <title>lexical example</title> > </head> > <body> > <p>hello world</p> > </body> > </html> > > The transform is: > > <xsl:template match="/"> > PIs <xsl:value-of select="count(//processing-instruction())"/> > </xsl:template> > > The class itself is below. The output when run from Java is "PIs 3" > so it's showing some pi's in the output. The code running > the transform is: > > XMLReader customXMLReader = new CustomXMLReader(); > SAXTransformerFactory stf = > (SAXTransformerFactory)TransformerFactory.newInstance(); > TransformerHandler handler = stf.newTransformerHandler(new > StreamSource("C:\\users\\andrew\\documents\\test.xsl")); > > xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler", > new CustomLexicalHandler(handler)); > handler.setResult(new StreamResult(System.out)); > customXMLReader.setContentHandler(handler); > > customXMLReader.parse("C:\\users\\andrew\\documents\\test.xml"); > > > >From the command line I'm using: > > java -cp blah/CustomXMLReader.jar;blah\saxon9.jar > net.sf.saxon.Transform -x:com.andrewjwelch.CustomXMLReader > test.xml test.xsl > > and getting the output "PIs 0".... > > The class is below. If I litter it with System.out's I can > see that it is used, that parse(String systemId()) is called, > but none of the lexical methods. Any ideas? > > package com.andrewjwelch; > > import java.io.IOException; > import org.xml.sax.InputSource; > import org.xml.sax.SAXException; > import org.xml.sax.XMLReader; > import org.xml.sax.ext.LexicalHandler; > import org.xml.sax.helpers.XMLFilterImpl; > import org.xml.sax.helpers.XMLReaderFactory; > > public class CustomXMLReader extends XMLFilterImpl implements > LexicalHandler { > > private boolean isProcessingDTD; > private XMLReader xmlReader; > > public CustomXMLReader() throws Exception { > super(); > xmlReader = XMLReaderFactory.createXMLReader(); > > xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler", > this); > super.setParent(xmlReader); > } > > @Override > public void parse(InputSource input) throws SAXException, > IOException { > super.parse(input); > } > > @Override > public void parse(String systemId) throws SAXException, > IOException { > super.parse(systemId); > } > > public void startDTD(String name, String publicId, String > systemId) throws SAXException { > super.processingInstruction("doctype-public", publicId); > super.processingInstruction("doctype-system", systemId); > isProcessingDTD = true; > } > > public void endDTD() throws SAXException { > isProcessingDTD = false; > } > > public void startEntity(String name) throws SAXException { > if (!isProcessingDTD) { > super.processingInstruction("entity", name); > } > } > > public void endEntity(String name) throws SAXException { } > > public void startCDATA() throws SAXException { } > > public void endCDATA() throws SAXException { } > > public void comment(char[] ch, int start, int length) > throws SAXException { } > > } > > > -- > Andrew Welch > http://andrewjwelch.com > Kernow: http://kernowforsaxon.sf.net/ > > -------------------------------------------------------------- > ----------- > This SF.Net email is sponsored by the Moblin Your Move > Developer's challenge Build the coolest Linux based > applications with Moblin SDK & win great prizes Grand prize > is a trip for two to an Open Source event anywhere in the > world http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > saxon-help mailing list archived at > http://saxon.markmail.org/ saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity referencesBy the way, it occurred to me that it would be nice to send the DTD information to Saxon in the same format that the saxon:doctype extension uses for output. http://www.saxonica.com/documentation/extensions/instructions/doctype.html If you do that, then the application could easily copy the internal DTD to the output if it chose to, or it could do so selectively, for example copying only the entity declarations. There must be some way of bringing saxon:entity-ref into the picture as well. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: saxon-help-bounces@... > [mailto:saxon-help-bounces@...] On Behalf > Of Andrew Welch > Sent: 18 July 2008 12:50 > To: Mailing list for the SAXON XSLT and XQuery processor > Subject: [saxon] Preserving the doctype and entity references > > I thought I'd have a go at this as it comes up quite often, > the goal being that people can use it from the command line > without needed to write any Java. > > I've written an XMLReader replacement that generates PIs for > the doctype and entities. It works as expected from Java, > but not from the command line using the -x switch. > > The test XML is: > > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml"> > <head> > <title>lexical example</title> > </head> > <body> > <p>hello world</p> > </body> > </html> > > The transform is: > > <xsl:template match="/"> > PIs <xsl:value-of select="count(//processing-instruction())"/> > </xsl:template> > > The class itself is below. The output when run from Java is "PIs 3" > so it's showing some pi's in the output. The code running > the transform is: > > XMLReader customXMLReader = new CustomXMLReader(); > SAXTransformerFactory stf = > (SAXTransformerFactory)TransformerFactory.newInstance(); > TransformerHandler handler = stf.newTransformerHandler(new > StreamSource("C:\\users\\andrew\\documents\\test.xsl")); > > xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler", > new CustomLexicalHandler(handler)); > handler.setResult(new StreamResult(System.out)); > customXMLReader.setContentHandler(handler); > > customXMLReader.parse("C:\\users\\andrew\\documents\\test.xml"); > > > >From the command line I'm using: > > java -cp blah/CustomXMLReader.jar;blah\saxon9.jar > net.sf.saxon.Transform -x:com.andrewjwelch.CustomXMLReader > test.xml test.xsl > > and getting the output "PIs 0".... > > The class is below. If I litter it with System.out's I can > see that it is used, that parse(String systemId()) is called, > but none of the lexical methods. Any ideas? > > package com.andrewjwelch; > > import java.io.IOException; > import org.xml.sax.InputSource; > import org.xml.sax.SAXException; > import org.xml.sax.XMLReader; > import org.xml.sax.ext.LexicalHandler; > import org.xml.sax.helpers.XMLFilterImpl; > import org.xml.sax.helpers.XMLReaderFactory; > > public class CustomXMLReader extends XMLFilterImpl implements > LexicalHandler { > > private boolean isProcessingDTD; > private XMLReader xmlReader; > > public CustomXMLReader() throws Exception { > super(); > xmlReader = XMLReaderFactory.createXMLReader(); > > xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler", > this); > super.setParent(xmlReader); > } > > @Override > public void parse(InputSource input) throws SAXException, > IOException { > super.parse(input); > } > > @Override > public void parse(String systemId) throws SAXException, > IOException { > super.parse(systemId); > } > > public void startDTD(String name, String publicId, String > systemId) throws SAXException { > super.processingInstruction("doctype-public", publicId); > super.processingInstruction("doctype-system", systemId); > isProcessingDTD = true; > } > > public void endDTD() throws SAXException { > isProcessingDTD = false; > } > > public void startEntity(String name) throws SAXException { > if (!isProcessingDTD) { > super.processingInstruction("entity", name); > } > } > > public void endEntity(String name) throws SAXException { } > > public void startCDATA() throws SAXException { } > > public void endCDATA() throws SAXException { } > > public void comment(char[] ch, int start, int length) > throws SAXException { } > > } > > > -- > Andrew Welch > http://andrewjwelch.com > Kernow: http://kernowforsaxon.sf.net/ > > -------------------------------------------------------------- > ----------- > This SF.Net email is sponsored by the Moblin Your Move > Developer's challenge Build the coolest Linux based > applications with Moblin SDK & win great prizes Grand prize > is a trip for two to an Open Source event anywhere in the > world http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > saxon-help mailing list archived at > http://saxon.markmail.org/ saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references2008/7/18 Michael Kay <mike@...>:
> Saxon nominates itself as the lexical handler by calling > parser.setProperty("...lexical-handler", ce) > > (see Sender line 378). > > "parser" here is your CustomXmlReader; which doesn't implement setProperty, > so the base class does parent.setProperty(), causing the lexical events to > be sent straight from Xerces to Saxon's ReceivingContentHandler (which > ignores most of them) rather than to your filter. > > You simply need to implement setProperty() to intercept this call. Ahh great, thanks. > If you want to do things properly you should pass all the lexical events on > to Saxon after dealing with them yourself. Saxon needs to know about > comments, and it needs to know about the start and end of the DTD so that it > can ignore comments and PIs occurring therein. It also likes to be told > about unparsed entities. Ok... -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references2008/7/18 Michael Kay <mike@...>:
> > By the way, it occurred to me that it would be nice to send the DTD > information to Saxon in the same format that the saxon:doctype extension > uses for output. > > http://www.saxonica.com/documentation/extensions/instructions/doctype.html > > If you do that, then the application could easily copy the internal DTD to > the output if it chose to, or it could do so selectively, for example > copying only the entity declarations. > > There must be some way of bringing saxon:entity-ref into the picture as > well. Ok, sounds like it could be potentially useful. I was going to maybe convert cdata sections to markup, wrapped in <x:cdata> or something... I'll take a look (workload permitting) -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references>> Saxon nominates itself as the lexical handler by calling
>> parser.setProperty("...lexical-handler", ce) >> >> (see Sender line 378). >> >> "parser" here is your CustomXmlReader; which doesn't implement setProperty, >> so the base class does parent.setProperty(), causing the lexical events to >> be sent straight from Xerces to Saxon's ReceivingContentHandler (which >> ignores most of them) rather than to your filter. >> >> You simply need to implement setProperty() to intercept this call. > > Ahh great, thanks. > >> If you want to do things properly you should pass all the lexical events on >> to Saxon after dealing with them yourself. Saxon needs to know about >> comments, and it needs to know about the start and end of the DTD so that it >> can ignore comments and PIs occurring therein. It also likes to be told >> about unparsed entities. > > Ok... I'm working on this now but the entity event's aren't making sense at the moment.... (I've never really used dtds) Given: <node>hello—world</node> I get the events: characters "hello" startEntity "mdash" endEntity "mdash" characters "-world" I guess the question is, what output is likely to be most useful: - A pi such as <?entity mdash?> without the entity expanded - An element <entity name="mdash">-</entity> with the entity expansion as the contents - something else...? And then given the answer to that: - What's the point of having two events startEntity and endEntity if they fire one after the other? - How can I prevent entity expansion, or better still separate the characters() from the expansion from other characters() calls (in the example get the dash on its own without 'world' being included) thanks -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references2008/7/23 Andrew Welch <andrew.j.welch@...>:
> Given: > > <node>hello—world</node> > > I get the events: > > characters "hello" > startEntity "mdash" > endEntity "mdash" > characters "-world" > > I guess the question is, what output is likely to be most useful: > > - A pi such as <?entity mdash?> without the entity expanded > - An element <entity name="mdash">-</entity> with the entity expansion > as the contents > - something else...? Safer would be an element, since an entity expansion can contain anything (including markup) Perhaps with a really odd name or namespaced? <xxx:entity>Entity expansion text</xxx:entity> Mike, would the saxon namespace be appropriate? > > And then given the answer to that: > > - What's the point of having two events startEntity and endEntity if > they fire one after the other? Like text, you may not get all the content of the entity expansion in one hit. Stack it up until you get the end event. Then process it. > - How can I prevent entity expansion, or better still separate the > characters() from the expansion from other characters() calls (in the > example get the dash on its own without 'world' being included) stack. bottom contains the element/entity you're dealing with last. HTH -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references> Safer would be an element, since an entity expansion can contain
> anything (including markup) yes, I think having an element with expanded entity as its contents is quite nice.... > Perhaps with a really odd name or namespaced? > <xxx:entity>Entity expansion text</xxx:entity> It's namespaced at the moment - as is the marked up cdata section. > Mike, would the saxon namespace be appropriate? I'm half tempted to make this some form of... ahem, commercial software so I may keep it in my namespace. (it's a typical big company problem). It would still be freely available to everyone, just require a license for commerical use. >> - How can I prevent entity expansion, or better still separate the >> characters() from the expansion from other characters() calls (in the >> example get the dash on its own without 'world' being included) > > > stack. bottom contains the element/entity you're dealing with last. Ok, given: hello—world the events are: characters "hello" startEntity endEntity characters "-world" so as you can see the end entity event fires before the event with the expanded contents of that entity. Even then, the expanded contents are coming through in the same characters event as text "world". There doesn't appear to be a way of determing what is the expanded entity and what isn't... -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity referencesaren't you supposed to see startEntity "mdash" characters "-" endEntity "mdash" characters "world" ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references> aren't you supposed to see
> > startEntity "mdash" > characters "-" > endEntity "mdash" > characters "world" That was my expectation... I'll investigate a bit more now. -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references> I guess the question is, what output is likely to be most useful: It depends what use one wants... people asking for unexpanded entities are often doing some kind of "modified identity transform" and they want entities to go back as they were. In which case - A pi such as <?entity mdash?> would (perhaps) be fine (except entities in attribute values, but I'm not sure sax reliably reports those anyway?) as that is easy to pick up in the xsl and write back as an entity ref. This is more or less equivalent to what you can do now anyway just doing a global sed or perl replace of & to [[[amp]]] doing a transform and then replacing [[[amp]]] back to &. but if you want an a modified identity transform that "preserves" entities but where the predicates such as test="mo[.='& #2013;']" or harder test="mo[text()='& #2013;']" test as true whether or not the original document uses an entity then the options are more limited. <mo><entity name="mdash">-</entity></mo> would work for the first form, but not the second. <mo><?entityStart name="mdash"?>-<?entityEnd name="ndash"?></mo> would work for both, and be closer to the sax events (possibly) Of course neither would work for test="mo[node()[1]='& #2013;']" but if you do that you probably shouldn't expect this kind of filter to work.... David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references> people asking for unexpanded entities are often doing some kind of
> "modified identity transform" and they want entities to go back as they > were. > > In which case > - A pi such as <?entity mdash?> > would (perhaps) be fine Except that you also need to stop the entity being expanded.... > <mo><entity name="mdash">-</entity></mo> > > would work for the first form, but not the second. > > <mo><?entityStart name="mdash"?>-<?entityEnd name="ndash"?></mo> > > would work for both, and be closer to the sax events (possibly) I've moved it on to xml-dev now as it's non Saxon specific, but I think for unparsed entities you get both events at the same time as a kind of hack to reuse them from parsed entities, instead of having an event of its own... which makes determining the expanded content difficult. > Of course neither would work for > > test="mo[node()[1]='& #2013;']" > > but if you do that you probably shouldn't expect this kind of filter to > work.... Good point... there's probably something that could be done, but for now there are more basic problems. thanks -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references> Given:
> > <node>hello—world</node> > > I get the events: > > characters "hello" > startEntity "mdash" > endEntity "mdash" > characters "-world" That's not what I would have expected, but it's not something I have ever tried to do. > > I guess the question is, what output is likely to be most useful: > > - A pi such as <?entity mdash?> without the entity expanded > - An element <entity name="mdash">-</entity> with the entity > expansion as the contents > - something else...? An element is going to be easier to process than a pair of PIs, but on the other hand it is very likely to stop existing code working if the code isn't expecting it. Your call! Another option (if you can get the info from the parser) is to output a PI giving the entity name only, leaving the user to find the content from the entity definition in the DTD if they need it - on the theory that they probably just want to copy the entity reference to the output rather than processing its expansion. Michael Kay http://www.saxonica.com/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |
|
|
Re: Preserving the doctype and entity references2008/7/23 Andrew Welch <andrew.j.welch@...>:
> Ok, given: > > hello—world > > the events are: > > characters "hello" > startEntity > endEntity > characters "-world" > > so as you can see the end entity event fires before the event with the > expanded contents of that entity. No, that seems wrong? Should be hello[xxxxx]world where xxx is the entity expansion? What is the text value of the entity Andrew? Even then, the expanded contents > are coming through in the same characters event as text "world". Ah! Seems like the parser is expanding entities before generating events? That's wrong somehow? Configuration perhaps? I'll have a look in sax2 and get back to you if I find anything. regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@... https://lists.sourceforge.net/lists/listinfo/saxon-help |