« Return to Thread: [ANN] KawaDD Progress Report: Major milestone achieved, both lexer and parser generation done via templates

[ANN] KawaDD Progress Report: Major milestone achieved, both lexer and parser generation done via templates

by Jonathan Revusky-3 :: Rate this Message:

Reply to Author | View in Thread

(This was earlier posted to kawadd-devel, but I meant to post it here as
well.)

A major milestone has been achieved in KawaDD. In the KawaDD codebase,
the output of both the parser and lexer is now  done via FreeMarker
templates.

This has allowed me to take stock of the situation. Summary:

(1) It was MUCH more work than I anticipated to get to this point. I
have to admit that if I had known in advance how much work it was, I
might not have done it. This was mostly because of how hopelessly
entangled the lexer generation code was. But now that it is done, I
think that some of the gains I anticipated are quite tangible. Once the
code that actually outputs java statements to the output are removed
from the codebase, and consolidated in external templates, one can much
more readily see what the algorithmic side of the code is actually
doing. Just as an example, the NfaState.java that is 3000 lines of code
in the JavaCC codebase is down to 1000 lines in KawaDD.

(2) One thing that I had no real idea of before actually doing this was
how much of a performance hit would be involved in using the templates.
Of course, within a reasonable margin, it doesn't matter very much. What
one cares about is the performance of the generated code, and the
generated code is basically the same as before. Anyway, your mileage
will vary, but I think it about right to say that KawaDD is currently
2-3 times as slow as JavaCC. As a practical matter, I think the tradeoff
is well worth it. A 200% slowdown may sound bad, but the fact is that,
on recent hardware, most any grammar you throw at JavaCC is processed in
a second or less. Thus, even with a 100-200% slowdown, KawaDD will only
rarely take more than 2 or 3 seconds. In most projects, the parser
generation part is mostly run as one step in a full clean+build that
takes much longer than that. For example, a full clean build of
FreeMarker takes maybe 10 seconds using JavaCC and 12 seconds using
KawaDD. The tradeoff of flexibility vs. runtime efficiency is pretty
clearly with the former here, since a system where the output is based
on templates can be customized fairly easily, while JavaCC, which uses
ostr.println() statements embedded directly in the code cannot be
customized like this.

Now, after such a huge refactoring of the code, it is more than
reasonable to wonder whether bugs were introduced. To be honest, it's
hard to be absolutely certain. For one thing, the coverage of the test
suite included with JavaCC is really quite poor, so the fact that KawaDD
passes those tests does not say much. OTOH, here are additional
functional tests that KawaDD currently passes:

(1) It can be used to build FreeMarker (both versions 2.3 and 2.4 that
differ significantly) and the resulting build passes all 60-odd unit
tests that are in the FreeMarker distro. This is a fairly significant
functional test, since FreeMarker has a quite large grammar that has
become extremely crufty after 6 years of continuous tweaks and so on.

(2) KawaDD passes the basic bootstrap test. KawaDD can be used to build
itself and the resulting build passes the aforementioned tests, all the
tests included with JavaCC, and FreeMarker versions 2.3 and 2.4.

(Actually, at points where I broke things (later fixed) in refactoring
the code, the the build would frequently pass the JavaCC test suite but
fail one or both of the above two functional tests, building/testing
freemarker or bootstrapping KawaDD itself.)

Now, I think the above inspires at least a guarded sense of confidence.
but I would greatly appreciate independent affirmation from JavaCC users
that KawaDD works as a drop-in replacement in their projects. There is
no binary distro yet, but it is easy enough to get your hands on KawaDD.
It is just:

svn co http://svn.kawadd.googlecode.com/svn/trunk/kawadd
cd kawadd
ant

And KawaDD can be launched using the scripts in the bin directory or
longhand, it's something like:

java -classpath
<kawadd-root>/kawadd.jar:<kawadd-root>/lib/freemarker.jar KawaDD
MyGrammar.jj

Oh, final note: KawaDD does not support either static parsers or
reusable parsers, so if your project uses either of those brain-damaged
ideas, KawaDD won't be a drop-in replacement. This will probably be
manifested by the compiler complaining that he various ReInit(...)
methods are gone. And gone they are. (Good riddance to bad rubbish...
;-)) Anyway, in that case, instead of, say,

MyParser.ReInit(...);
MyParser.rootProduction(...);

you need to rewrite this as:

MyParser myParser = new MyParser(...);
myParser.rootProduction(...);

I had to do that in some of the test/example code that comes with JavaCC
so that it would work with KawaDD.

Best Regards,

Jonathan Revusky
--
lead developer, FreeMarker project, http://freemarker.org/
KawaDD Parser Generator, http://code.google.com/p/kawadd


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...

 « Return to Thread: [ANN] KawaDD Progress Report: Major milestone achieved, both lexer and parser generation done via templates

LightInTheBox - Buy quality products at wholesale price!