|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
The need for multiple passes over a documentI was looking at the TOC macro and I feel what it's doing is wrong
insofar as requiring a second pass to get the structure of the document. There are definitely cases where you need to make multiple passes and the TOC macro is clearly one of them. Having to pass in the the whole source document and the parser to make the TOC macro work seem extreme to me. I think that we should declaratively say, or determine, that the structure of the document is required by something in the page. Preprocess the page in a general way and not require passing in the whole document and parser again as that's pretty cumbersome for the implementor of a parser. I also noticed that the parsers are not threadsafe, I don't believe this was always the case and we should make them threadsafe again if it's true they aren't. I just looked at the APT parser and it doesn't look threadsafe to me but wouldn't take much to make it threadsafe. I would like to take a pass at making the document structure requirement more general to avoid things like we're doing in the TOC macro. I would also like to take a pass at making the parsers threadsafe. I think we should also just release 1.0 for the sake of the site plugin and then move on with the next version of Doxia. We need to remove the coupling of doxia to the site plugin and move the core back to a simple set of parsers and sinks. Thanks, Jason ---------------------------------------------------------- Jason van Zyl Founder, Apache Maven jason at sonatype dot com ---------------------------------------------------------- A language that doesn’t affect the way you think about programming is not worth knowing. -— Alan Perlis |
|
|
Re: The need for multiple passes over a documentOn Apr 28, 2008, at 3:49 PM, Jason van Zyl wrote: > I was looking at the TOC macro and I feel what it's doing is wrong > insofar as requiring a second pass to get the structure of the > document. > > There are definitely cases where you need to make multiple passes > and the TOC macro is clearly one of them. Having to pass in the the > whole source document and the parser to make the TOC macro work seem > extreme to me. > > I think that we should declaratively say, or determine, that the > structure of the document is required by something in the page. > Preprocess the page in a general way and not require passing in the > whole document and parser again as that's pretty cumbersome for the > implementor of a parser. In XWiki land, I parse the document once, which generates an AST of Blocks. Then this AST is passed to macros. Note that one reason we build an in memory AST is to cache it in order to serve pages faster (no need to reparse static elements - only macros are reparsed). -Vincent > I also noticed that the parsers are not threadsafe, I don't believe > this was always the case and we should make them threadsafe again if > it's true they aren't. I just looked at the APT parser and it > doesn't look threadsafe to me but wouldn't take much to make it > threadsafe. > > I would like to take a pass at making the document structure > requirement more general to avoid things like we're doing in the TOC > macro. I would also like to take a pass at making the parsers > threadsafe. > > I think we should also just release 1.0 for the sake of the site > plugin and then move on with the next version of Doxia. We need to > remove the coupling of doxia to the site plugin and move the core > back to a simple set of parsers and sinks. > > Thanks, > > Jason |
|
|
Re: The need for multiple passes over a documentOn 28-Apr-08, at 12:18 PM, Vincent Massol wrote: > > On Apr 28, 2008, at 3:49 PM, Jason van Zyl wrote: > >> I was looking at the TOC macro and I feel what it's doing is wrong >> insofar as requiring a second pass to get the structure of the >> document. >> >> There are definitely cases where you need to make multiple passes >> and the TOC macro is clearly one of them. Having to pass in the the >> whole source document and the parser to make the TOC macro work >> seem extreme to me. >> >> I think that we should declaratively say, or determine, that the >> structure of the document is required by something in the page. >> Preprocess the page in a general way and not require passing in the >> whole document and parser again as that's pretty cumbersome for the >> implementor of a parser. > > In XWiki land, I parse the document once, which generates an AST of > Blocks. Then this AST is passed to macros. > Right, this would be the same as parsing the document into a StructureSink, but this is still not necessary most of the time. A macro could say it required the StructureSink to operate on, or we just always parse into the StructureSink and modify this and then render it. > Note that one reason we build an in memory AST is to cache it in > order to serve pages faster (no need to reparse static elements - > only macros are reparsed). > Sure, but I think caching is orthogonal to this. But passing in the parser and the document again to a macro is not good. > -Vincent > >> I also noticed that the parsers are not threadsafe, I don't believe >> this was always the case and we should make them threadsafe again >> if it's true they aren't. I just looked at the APT parser and it >> doesn't look threadsafe to me but wouldn't take much to make it >> threadsafe. >> >> I would like to take a pass at making the document structure >> requirement more general to avoid things like we're doing in the >> TOC macro. I would also like to take a pass at making the parsers >> threadsafe. >> >> I think we should also just release 1.0 for the sake of the site >> plugin and then move on with the next version of Doxia. We need to >> remove the coupling of doxia to the site plugin and move the core >> back to a simple set of parsers and sinks. >> >> Thanks, >> >> Jason > Thanks, Jason ---------------------------------------------------------- Jason van Zyl Founder, Apache Maven jason at sonatype dot com ---------------------------------------------------------- Three people can keep a secret provided two of them are dead. -- Unknown |
|
|
Re: The need for multiple passes over a document2008/4/28 Jason van Zyl <jason@...>:
> I was looking at the TOC macro and I feel what it's doing is wrong insofar > as requiring a second pass to get the structure of the document. Agree but we did it as best that we can :) > There are definitely cases where you need to make multiple passes and the > TOC macro is clearly one of them. Having to pass in the the whole source > document and the parser to make the TOC macro work seem extreme to me. > > I think that we should declaratively say, or determine, that the structure > of the document is required by something in the page. Preprocess the page in > a general way and not require passing in the whole document and parser again > as that's pretty cumbersome for the implementor of a parser. > > I also noticed that the parsers are not threadsafe, I don't believe this > was always the case and we should make them threadsafe again if it's true > they aren't. I just looked at the APT parser and it doesn't look threadsafe > to me but wouldn't take much to make it threadsafe. DefaultDoxia as a comment about thread safe... > > I would like to take a pass at making the document structure requirement > more general to avoid things like we're doing in the TOC macro. I would also > like to take a pass at making the parsers threadsafe. > > I think we should also just release 1.0 for the sake of the site plugin and > then move on with the next version of Doxia. We need to remove the coupling > of doxia to the site plugin and move the core back to a simple set of > parsers and sinks. Sounds like a Doxia 2.0 :) I think Doxia has several limitations, specially for style. DOXIA-204 solved several of them but I think we could do more. Cheers, Vincent > > Thanks, > > Jason > > ---------------------------------------------------------- > Jason van Zyl > Founder, Apache Maven > jason at sonatype dot com > ---------------------------------------------------------- > > A language that doesn't affect the way you think about programming is not > worth knowing. > > -— Alan Perlis > > > > |
|
|
Re: The need for multiple passes over a documentOn Apr 29, 2008, at 1:56 AM, Vincent Siveton wrote: > 2008/4/28 Jason van Zyl <jason@...>: >> I was looking at the TOC macro and I feel what it's doing is wrong >> insofar >> as requiring a second pass to get the structure of the document. > > Agree but we did it as best that we can :) > >> There are definitely cases where you need to make multiple passes >> and the >> TOC macro is clearly one of them. Having to pass in the the whole >> source >> document and the parser to make the TOC macro work seem extreme to >> me. >> >> I think that we should declaratively say, or determine, that the >> structure >> of the document is required by something in the page. Preprocess >> the page in >> a general way and not require passing in the whole document and >> parser again >> as that's pretty cumbersome for the implementor of a parser. >> >> I also noticed that the parsers are not threadsafe, I don't believe >> this >> was always the case and we should make them threadsafe again if >> it's true >> they aren't. I just looked at the APT parser and it doesn't look >> threadsafe >> to me but wouldn't take much to make it threadsafe. > > DefaultDoxia as a comment about thread safe... > >> >> I would like to take a pass at making the document structure >> requirement >> more general to avoid things like we're doing in the TOC macro. I >> would also >> like to take a pass at making the parsers threadsafe. >> >> I think we should also just release 1.0 for the sake of the site >> plugin and >> then move on with the next version of Doxia. We need to remove the >> coupling >> of doxia to the site plugin and move the core back to a simple set >> of >> parsers and sinks. > > Sounds like a Doxia 2.0 :) I think Doxia has several limitations, > specially for style. DOXIA-204 solved several of them but I think we > could do more. BTW if you're interested to follow what I'm doing in xwiki land it's available here: http://svn.xwiki.org/svnroot/sandbox/components/xwiki-rendering/ Architecture/spec is here: http://dev.xwiki.org/xwiki/bin/view/Design/NewRenderingArchitecture) Basically I have the following main objects: * Listener (Sink in Doxia speak) * Parser * Macro * Transformation * Document AST The process is: 1) text is transformed in AST by Parser 2) the Transformation manager finds the list of transformation components to execute on the AST 3) One such transformation is called MacroTransformation and is in charge of looking for all MacroBlock blocks in the AST and executing them till there are no more MacroBlock (this allows nested Macros). Thus a Macro takes an AST as parameter and generates a list of Blocks. 4) The modified AST is then traversed (traverse()) with a Listener Note1: XWiki can use both Doxia and Wikimodel transparently since it has a bridge to both. Right now the bridge I have is a Parser bridge where Doxia or Wikimodel parsers generate a XWiki Document AST. In this manner I'm reusing Doxia and WikiModel's parsers. My next step is to have a Sink bridge so that I can use Doxia sinks. Note2: The events I have are finer grained since I have events at the Word level: void onWord(String word); void onSpace(); void onSpecialSymbol(SpecialSymbol symbol); Note3: Since I want to support generation of HTML elements from Macros I have an HTMLBlock element and the following Listener events: void beginXMLElement(String name, Map<String, String> attributes); void endXMLElement(String name, Map<String, String> attributes); -Vincent |
| Free Forum Powered by Nabble | Forum Help |