|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
Overriding Token type in CommonTokenActionI have a situation where I'm translating a custom lexer/yacc parser into JavaCC. I have two different tokens, both of which match the same regular expression, but the custom lexer returns a different token type based upon a run-time call to some lookup table.
As far as I know, there's no way for me to insert special code into the lexer part of a JavaCC specification (i.e. no backtracking), so I faked out this behavior by implementing a CommonTokenAction method (TOKEN_MGR_DECLS) to check the lookup table and modify the Token if it needs to be a different type. My first question is this: how do I get JavaCC to insert another Token type into *Constants so I can use it in parser production rules/CommonTokenAction, but not actually generate any DFA rules about it? Here's what I did so far: TOKEN : { < FUBAR_TYPE : ~["\0"-"\377"] > } which I believe reads "match one character that is outside the range 0x0-0xff." Should work fine, even though it's a kludge, given stock ASCII input (i.e. it should never match anything, which is what I want...I control when that Token is produced based upon my call). But I'm thinking it might go wonky with full Unicode, and it's just plain ugly anyway. I want something like: TOKEN: { <FUBAR_TYPE> } so I can use that name anywhere but the *TokenManager won't ever try to explicitly match for it. Second question: am I totally nuts? Is there a much better way to get (differing) dynamic token types out of the lexer (i.e. depend on a runtime call) that are based upon the same regular expression match string? Thanks in advance to any helpers out there... -- Mike J. Bell on gmail |
|
|
Re: Overriding Token type in CommonTokenActionMike J. Bell wrote:
> I have a situation where I'm translating a custom lexer/yacc parser into > JavaCC. I have two different tokens, both of which match the same > regular expression, but the custom lexer returns a different token type > based upon a run-time call to some lookup table. > > As far as I know, there's no way for me to insert special code into the > lexer part of a JavaCC specification (i.e. no backtracking), so I faked > out this behavior by implementing a CommonTokenAction method > (TOKEN_MGR_DECLS) to check the lookup table and modify the Token if it > needs to be a different type. > > My first question is this: how do I get JavaCC to insert another Token > type into *Constants so I can use it in parser production > rules/CommonTokenAction, but not actually generate any DFA rules about > it? Here's what I did so far: > > TOKEN : { < FUBAR_TYPE : ~["\0"-"\377"] > } > > which I believe reads "match one character that is outside the range > 0x0-0xff." Should work fine, even though it's a kludge, given stock > ASCII input (i.e. it should never match anything, which is what I > want...I control when that Token is produced based upon my call). But > I'm thinking it might go wonky with full Unicode, and it's just plain > ugly anyway. I want something like: > > TOKEN: { <FUBAR_TYPE> } > > so I can use that name anywhere but the *TokenManager won't ever try to > explicitly match for it. It seems to me that you can get what you want by simply creating a phony lexical state that is never used. <PHONY> TOKEN: { <FUBAR_TYPE> : "asdf"} If you are never actually in the "PHONY" lexical state, then it will never attempt to explicitly match for FUBAR_TYPE. However, FUBAR_TYPE will occur in your XXXConstants.java and in your CommonTokenAction() method, you can, in the appropriate place do: tok.type = FUBAR_TYPE; I guess the only drawback is that some code is generated for DFA matching of FUBAR_TYPE -- effectively dead code that is never called. I don't see that as a very big problem though. > > Second question: am I totally nuts? Is there a much better way to get > (differing) dynamic token types out of the lexer (i.e. depend on a > runtime call) that are based upon the same regular expression match string? Well, I guess the standard way to have the same string (or set of strings) be different tokens at different times is by use of lexical states. Jonathan Revusky -- lead developer, FreeMarker project, http://freemarker.org/ KawaDD Parser Generator, http://code.google.com/p/kawadd > > Thanks in advance to any helpers out there... > > -- > Mike J. Bell on gmail --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Re: Overriding Token type in CommonTokenActionOn Tue, Aug 12, 2008 at 8:14 AM, Jonathan Revusky <revusky@...> wrote:
> Mike J. Bell wrote: >> Second question: am I totally nuts? Is there a much better way to get >> (differing) dynamic token types out of the lexer (i.e. depend on a runtime >> call) that are based upon the same regular expression match string? > > Well, I guess the standard way to have the same string (or set of strings) > be different tokens at different times is by use of lexical states. Which can be switched to from java code in the productions, so you can have a lookahead piece of code before the token in question to switch as needed based on what's coming up and if it's in the table or not. > Jonathan Revusky The other alternative is to have a single token that matches that pattern of chars, and give it different semantic meaning based on it's context rather than having an actual different token. Then in a following AST pass you can alter things in the tree as needed to reflect the correct meaning. I made a parser that had alternative constants (these were tree node types rather than token types, but the same idea applies), so I ended up having my baseclass that included the interface that had the constants also include another with my additions (I used negative values to avoid ever clashing). -- - J.Chris Findlay (c: --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free Forum Powered by Nabble | Forum Help |