Overriding Token type in CommonTokenAction

View: New views
3 Messages — Rating Filter:   Alert me  

Overriding Token type in CommonTokenAction

by Mike J. Bell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a situation where I'm translating a custom lexer/yacc parser into JavaCC.  I have two different tokens, both of which match the same regular expression, but the custom lexer returns a different token type based upon a run-time call to some lookup table.

As far as I know, there's no way for me to insert special code into the lexer part of a JavaCC specification (i.e. no backtracking), so I faked out this behavior by implementing a CommonTokenAction method (TOKEN_MGR_DECLS) to check the lookup table and modify the Token if it needs to be a different type.

My first question is this:  how do I get JavaCC to insert another Token type into *Constants so I can use it in parser production rules/CommonTokenAction, but not actually generate any DFA rules about it?  Here's what I did so far:

TOKEN : { < FUBAR_TYPE : ~["\0"-"\377"] > }

which I believe reads "match one character that is outside the range 0x0-0xff."  Should work fine, even though it's a kludge, given stock ASCII input (i.e. it should never match anything, which is what I want...I control when that Token is produced based upon my call).  But I'm thinking it might go wonky with full Unicode, and it's just plain ugly anyway.  I want something like:

TOKEN: { <FUBAR_TYPE> }

so I can use that name anywhere but the *TokenManager won't ever try to explicitly match for it.

Second question:  am I totally nuts?  Is there a much better way to get (differing) dynamic token types out of the lexer (i.e. depend on a runtime call) that are based upon the same regular expression match string?

Thanks in advance to any helpers out there...

--
Mike J. Bell on gmail

Re: Overriding Token type in CommonTokenAction

by Jonathan Revusky-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mike J. Bell wrote:

> I have a situation where I'm translating a custom lexer/yacc parser into
> JavaCC.  I have two different tokens, both of which match the same
> regular expression, but the custom lexer returns a different token type
> based upon a run-time call to some lookup table.
>
> As far as I know, there's no way for me to insert special code into the
> lexer part of a JavaCC specification (i.e. no backtracking), so I faked
> out this behavior by implementing a CommonTokenAction method
> (TOKEN_MGR_DECLS) to check the lookup table and modify the Token if it
> needs to be a different type.
>
> My first question is this:  how do I get JavaCC to insert another Token
> type into *Constants so I can use it in parser production
> rules/CommonTokenAction, but not actually generate any DFA rules about
> it?  Here's what I did so far:
>
> TOKEN : { < FUBAR_TYPE : ~["\0"-"\377"] > }
>
> which I believe reads "match one character that is outside the range
> 0x0-0xff."  Should work fine, even though it's a kludge, given stock
> ASCII input (i.e. it should never match anything, which is what I
> want...I control when that Token is produced based upon my call).  But
> I'm thinking it might go wonky with full Unicode, and it's just plain
> ugly anyway.  I want something like:
>
> TOKEN: { <FUBAR_TYPE> }
>
> so I can use that name anywhere but the *TokenManager won't ever try to
> explicitly match for it.

It seems to me that you can get what you want by simply creating a phony
lexical state that is never used.

<PHONY> TOKEN: { <FUBAR_TYPE> : "asdf"}

If you are never actually in the "PHONY" lexical state, then it will
never attempt to explicitly match for FUBAR_TYPE. However, FUBAR_TYPE
will occur in your XXXConstants.java and in your CommonTokenAction()
method, you can, in the appropriate place do:

tok.type = FUBAR_TYPE;

I guess the only drawback is that some code is generated for DFA
matching of FUBAR_TYPE -- effectively dead code that is never called. I
don't see that as a very big problem though.

>
> Second question:  am I totally nuts?  Is there a much better way to get
> (differing) dynamic token types out of the lexer (i.e. depend on a
> runtime call) that are based upon the same regular expression match string?

Well, I guess the standard way to have the same string (or set of
strings) be different tokens at different times is by use of lexical
states.

Jonathan Revusky
--
lead developer, FreeMarker project, http://freemarker.org/
KawaDD Parser Generator, http://code.google.com/p/kawadd

>
> Thanks in advance to any helpers out there...
>
> --
> Mike J. Bell on gmail


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: Re: Overriding Token type in CommonTokenAction

by J.Chris Findlay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Aug 12, 2008 at 8:14 AM, Jonathan Revusky <revusky@...> wrote:
> Mike J. Bell wrote:
>> Second question:  am I totally nuts?  Is there a much better way to get
>> (differing) dynamic token types out of the lexer (i.e. depend on a runtime
>> call) that are based upon the same regular expression match string?
>
> Well, I guess the standard way to have the same string (or set of strings)
> be different tokens at different times is by use of lexical states.

Which can be switched to from java code in the productions, so you can
have a lookahead piece of code before the token in question to switch
as needed based on what's coming up and if it's in the table or not.

> Jonathan Revusky

The other alternative is to have a single token that matches that
pattern of chars, and give it different semantic meaning based on it's
context rather than having an actual different token.  Then in a
following AST pass you can alter things in the tree as needed to
reflect the correct meaning.
I made a parser that had alternative constants (these were tree node
types rather than token types, but the same idea applies), so I ended
up having my baseclass that included the interface that had the
constants also include another with my additions (I used negative
values to avoid ever clashing).

--
 - J.Chris Findlay
 (c:

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...

LightInTheBox - Buy quality products at wholesale price!