|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Newbie questionI just posted this to comp.compilers.tools.javacc, but maybe this is a
better forum: I'm just evaluating using javacc for a project I'm working on and I've run into a small problem. Here's the token section of the simple parser I'm playing with: TOKEN : { <HELP_CMD: "HELP" > | <GROUP_CMD: "GROUP" > | <ARTICLE_CMD: "ARTICLE" > | <POST_CMD: "POST" > | <MSG_ID: "<"(["a"-"z","_", ".","@"])+">" > | <GROUP: ["a"-"z"](["a"-"z","."])+["a"-"z"] > } The problem is that the quoted dots seem to be treated as any character symbols and not a true dot. I'm not used to this method of specifying regular expressions, so I don't really understand what I'm specifying. How does one specify a period? Is there a simple tutorial for javacc regular expressions somewhere? Thanks, -- Kenneth P. Turvey <kt-usenet@...> http://www.electricsenator.net Ye shall know the truth, and the truth shall make you mad. -- Aldous Huxley --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie questionOn Aug 14, 2008, at 4:49 AM, Kennneth P. Turvey wrote: > I just posted this to comp.compilers.tools.javacc, but maybe this is a > better forum: > > > > I'm just evaluating using javacc for a project I'm working on and I've > run into a small problem. > > Here's the token section of the simple parser I'm playing with: > > TOKEN : { > <HELP_CMD: "HELP" > > | <GROUP_CMD: "GROUP" > > | <ARTICLE_CMD: "ARTICLE" > > | <POST_CMD: "POST" > > | <MSG_ID: "<"(["a"-"z","_", ".","@"])+">" > > | <GROUP: ["a"-"z"](["a"-"z","."])+["a"-"z"] > > } > > The problem is that the quoted dots seem to be treated as any > character > symbols and not a true dot. I'm not used to this method of specifying > regular expressions, so I don't really understand what I'm specifying. > How does one specify a period? Hm, can you post a small example of the unexpected behavior you're seeing? The token definitions above seem to work ok when I run them, e.g.: ==================================== $ cat dot.jj && javacc dot.jj && javac *.java && java DotTokenManager "HELP a.b" options { BUILD_PARSER=false; } PARSER_BEGIN(Dot) public class Dot {} PARSER_END(Dot) TOKEN_MGR_DECLS : { public static void main(String[] args) { java.io.StringReader sr = new java.io.StringReader(args[0]); SimpleCharStream scs = new SimpleCharStream(sr); DotTokenManager tokenizer = new DotTokenManager(scs); for (Token t = tokenizer.getNextToken(); t.kind != EOF; t = tokenizer.getNextToken()) { debugStream.println("Found token: " + t.image); } } } SKIP : { " " } TOKEN : { <HELP_CMD: "HELP" > | <GROUP_CMD: "GROUP" > | <ARTICLE_CMD: "ARTICLE" > | <POST_CMD: "POST" > | <MSG_ID: "<"(["a"-"z","_", ".","@"])+">" > | <GROUP: ["a"-"z"](["a"-"z","."])+["a"-"z"] > } Java Compiler Compiler Version 4.1d1 (Parser Generator) (type "javacc" with no arguments for help) Reading from file dot.jj . . . File "TokenMgrError.java" is being rebuilt. File "ParseException.java" is being rebuilt. File "Token.java" is being rebuilt. File "SimpleCharStream.java" is being rebuilt. Parser generated successfully. Found token: HELP Found token: a.b ==================================== > Is there a simple tutorial for javacc regular expressions somewhere? This developerworks article is pretty good: http://www.ibm.com/developerworks/db2/library/techarticle/dm-0401brereton/index.html Although it doesn't go into all the various types of regular expressions. Yours, Tom http://generatingparserswithjavacc.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie questionOn Thu, 14 Aug 2008 08:49:10 +0000, Kennneth P. Turvey wrote:
> I just posted this to comp.compilers.tools.javacc, but maybe this is a > better forum: Are these two groups intended to have different purposes? What is the preferred way to get help with javacc? Again, this was posted on the newsgroup above as well. I've got another question for you... two actually. 1) Is there a way to catch a ParseException, handle it, skip to the beginning of the next line and start parsing again? 2) Is there a way to force a regular expression to match at most a maximum of characters? -- Kenneth P. Turvey <kt-usenet@...> http://www.electricsenator.net Idiotic but humorous George W. Bush quote elided due to the author's inability to select one from the vast number of them available. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Re: Newbie questionThe first of these is covered by the docs/examples.
On 8/15/08, Kennneth P. Turvey <kt-usenet@...> wrote: > On Thu, 14 Aug 2008 08:49:10 +0000, Kennneth P. Turvey wrote: > > > I just posted this to comp.compilers.tools.javacc, but maybe this is a > > better forum: > > > Are these two groups intended to have different purposes? What is the > preferred way to get help with javacc? > > Again, this was posted on the newsgroup above as well. > > > > I've got another question for you... two actually. > > 1) Is there a way to catch a ParseException, handle it, skip to the > beginning of the next line and start parsing again? > > 2) Is there a way to force a regular expression to match at most a > maximum of characters? > > > -- > Kenneth P. Turvey <kt-usenet@...> > http://www.electricsenator.net > > > Idiotic but humorous George W. Bush quote elided due to the > author's inability to select one from the vast number of them > available. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@... > For additional commands, e-mail: users-help@... > > -- - J.Chris Findlay (c: --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie questionOn Thu, 14 Aug 2008 07:12:41 -0400, Tom Copeland wrote:
> Hm, can you post a small example of the unexpected behavior you're > seeing? The token definitions above seem to work ok when I run them, > e.g.: The group token seemed to be matching anything. It looked like the period in the token was being allowed to match letters, numbers, symbols. I fixed it by pulling the period out of the brackets, but it may have been completely by chance that it started working again. If you say that a quoted period only matches a period, then I guess I'll trust you for now. I'm just trying to figure this stuff out. I do wish that every different tool didn't require me to learn an entirely new regular expression syntax, but I guess that's the world we live in. -- Kenneth P. Turvey <kt-usenet@...> http://www.electricsenator.net Over grown military establishments are under any form of government inauspicious to liberty, and are to be regarded as particularly hostile to republican liberty. -- George Washington --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie questionOn Fri, 15 Aug 2008 09:08:21 +1200, J.Chris Findlay wrote:
> The first of these is covered by the docs/examples. I'll look through there again. Thanks. -- Kenneth P. Turvey <kt-usenet@...> http://www.electricsenator.net It behooves every man who values liberty of conscience for himself, to resist invasions of it in the case of others; or their case may, by change of circumstances, become his own. -- Thomas Jefferson --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Re: Newbie questionHere's a link to the reference for error handling. This document also
covers how to skip ahead and continue parsing. https://javacc.dev.java.net/doc/errorrecovery.html Don't worry that the documentation says it covers javacc 0.7.1, it is still valid with the current version of javacc. Dale Kennneth P. Turvey wrote: > On Thu, 14 Aug 2008 08:49:10 +0000, Kennneth P. Turvey wrote: > > >> I just posted this to comp.compilers.tools.javacc, but maybe this is a >> better forum: >> > > Are these two groups intended to have different purposes? What is the > preferred way to get help with javacc? > > Again, this was posted on the newsgroup above as well. > > > > I've got another question for you... two actually. > > 1) Is there a way to catch a ParseException, handle it, skip to the > beginning of the next line and start parsing again? > > 2) Is there a way to force a regular expression to match at most a > maximum of characters? > > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie questionOn Thu, 14 Aug 2008 16:45:21 -0600, Dale Anson wrote:
> Here's a link to the reference for error handling. This document also > covers how to skip ahead and continue parsing. > > https://javacc.dev.java.net/doc/errorrecovery.html > > Don't worry that the documentation says it covers javacc 0.7.1, it is > still valid with the current version of javacc. > That's great, but it doesn't really talk about scanning errors. I assume I use the try catch block again, but I'm not sure how to get the token manager to skip a bad token and move to the end of a line of input. Basically I want to throw away tokens until I find a good one, END_LINE. Thanks. -- Kenneth P. Turvey <kt-usenet@...> http://www.electricsenator.net A computer lets you make more mistakes faster than any invention in human history with the possible exceptions of handguns and tequila. -- Mitch Ratliffe, Technology Review, April, 1992 --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Re: Newbie questionOn Aug 14, 2008, at 5:05 PM, Kennneth P. Turvey wrote: > On Thu, 14 Aug 2008 08:49:10 +0000, Kennneth P. Turvey wrote: > >> I just posted this to comp.compilers.tools.javacc, but maybe this >> is a >> better forum: > > Are these two groups intended to have different purposes? What is the > preferred way to get help with javacc? I think probably this mailing list is the better place... I'm not sure about other folks, but I don't monitor the newsgroup, just the mailing list. Yours, Tom --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Re: Newbie questionOn Fri, Aug 15, 2008 at 11:21 AM, Kennneth P. Turvey
<kt-usenet@...> wrote: > On Thu, 14 Aug 2008 16:45:21 -0600, Dale Anson wrote: > >> Here's a link to the reference for error handling. This document also >> covers how to skip ahead and continue parsing. >> >> https://javacc.dev.java.net/doc/errorrecovery.html >> >> Don't worry that the documentation says it covers javacc 0.7.1, it is >> still valid with the current version of javacc. >> > > That's great, but it doesn't really talk about scanning errors. I assume > I use the try catch block again, but I'm not sure how to get the token > manager to skip a bad token and move to the end of a line of input. > > Basically I want to throw away tokens until I find a good one, END_LINE. ...which is exactly what that document covers. > Thanks. > > -- > Kenneth P. Turvey <kt-usenet@...> > http://www.electricsenator.net > > A computer lets you make more mistakes faster than any invention in > human history with the possible exceptions of handguns and tequila. > -- Mitch Ratliffe, Technology Review, April, 1992 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@... > For additional commands, e-mail: users-help@... > > -- - J.Chris Findlay (c: --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Re: Newbie questionOn Aug 14, 2008, at 5:09 PM, Kennneth P. Turvey wrote: > On Thu, 14 Aug 2008 07:12:41 -0400, Tom Copeland wrote: > >> Hm, can you post a small example of the unexpected behavior you're >> seeing? The token definitions above seem to work ok when I run them, >> e.g.: > > The group token seemed to be matching anything. It looked like the > period in the token was being allowed to match letters, numbers, > symbols. I fixed it by pulling the period out of the brackets, but it > may have been completely by chance that it started working again. > > If you say that a quoted period only matches a period, then I guess > I'll > trust you for now. I'm just trying to figure this stuff out. I do > wish > that every different tool didn't require me to learn an entirely new > regular expression syntax, but I guess that's the world we live in. Yup, that's how it should work. True, it's another regex syntax, but I think you'll recognize lots of familiar stuff in JavaCC's regex system.... e.g., repetitions: TOKEN : { <FOUR_H : ("h"){4}> } Yours, Tom --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie questionOn Fri, 15 Aug 2008 11:31:14 +1200, J.Chris Findlay wrote:
>> Basically I want to throw away tokens until I find a good one, >> END_LINE. > > ...which is exactly what that document covers. > It covers how to handle ParseExceptions in this manner, but there isn't any discussion of handling TokenMgrErrors. The code provided will not work for a TokenMgrError, since getting the next token throws the error again. -- Kenneth P. Turvey <kt-usenet@...> http://www.electricsenator.net It behooves every man who values liberty of conscience for himself, to resist invasions of it in the case of others; or their case may, by change of circumstances, become his own. -- Thomas Jefferson --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie questionKennneth P. Turvey wrote:
> On Fri, 15 Aug 2008 11:31:14 +1200, J.Chris Findlay wrote: > >>> Basically I want to throw away tokens until I find a good one, >>> END_LINE. >> ...which is exactly what that document covers. >> > > It covers how to handle ParseExceptions in this manner, but there isn't > any discussion of handling TokenMgrErrors. The code provided will not > work for a TokenMgrError, since getting the next token throws the error > again. > Yes, but what you need is for the TokenMgrError to not be a TokenMgrError but instead to be a ParseException. The way to do that is to create a catch-all Token that matches anything. You could put the following after all your other token definitions: <CATCH_ALL : ~[]> You have various possibilities from here. You could perhaps handle these situations in a CommonTokenAction(Token t) method in your TOKEN_MGR_DECLS. or do a try-catch in your top-level production that, in the catch block, scans forward until the next END_LINE is reached. Jonathan Revusky -- lead developer, FreeMarker project, http://freemarker.org/ KawaDD Parser Generator, http://code.google.com/p/kawadd --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie questionOn Thu, 14 Aug 2008 18:58:11 -0700, Jonathan Revusky wrote:
> Kennneth P. Turvey wrote: >> On Fri, 15 Aug 2008 11:31:14 +1200, J.Chris Findlay wrote: >> >>>> Basically I want to throw away tokens until I find a good one, >>>> END_LINE. >>> ...which is exactly what that document covers. >>> >>> >> It covers how to handle ParseExceptions in this manner, but there isn't >> any discussion of handling TokenMgrErrors. The code provided will not >> work for a TokenMgrError, since getting the next token throws the error >> again. >> >> > Yes, but what you need is for the TokenMgrError to not be a > TokenMgrError but instead to be a ParseException. > > The way to do that is to create a catch-all Token that matches anything. > You could put the following after all your other token definitions: > > <CATCH_ALL : ~[]> > > You have various possibilities from here. You could perhaps handle these > situations in a CommonTokenAction(Token t) method in your > TOKEN_MGR_DECLS. > > or do a try-catch in your top-level production that, in the catch block, > scans forward until the next END_LINE is reached. Thanks for the info, but I'm still not really sure how to change my code so that the parser does what I want it to. The problem is that by the time the parser has determined that there is an error, it is already to a the <EOF> that doesn't match. This means that it has already exited the command loop. This isn't what I want. I'll attach my simple grammar below and maybe somebody could help me get it to work. Thanks. options { STATIC = false; IGNORE_CASE = true; } PARSER_BEGIN(NNTPParser) import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.IOException; public class NNTPParser { public static void main(String[] args) throws ParseException,IOException { BufferedReader reader = new BufferedReader(new InputStreamReader (System.in)); NNTPParser parser = new NNTPParser(reader); parser.start(); } } PARSER_END(NNTPParser) SKIP : { " " | "\t" } TOKEN : { <HELP_CMD: "HELP" > | <GROUP_CMD: "GROUP" > | <ARTICLE_CMD: "ARTICLE" > | <POST_CMD: "POST" > | <MSG_ID: "<"((["a"-"z","_", "@"])*".")*(["a"-"z"])*">" > | <GROUP: ((["a"-"z"])*".")*(["a"-"z"])+ > | <LINE_END: "\n" | "\r\n" > } void start() throws IOException : { Token t; } { ( try { <HELP_CMD> <LINE_END> { System.out.println("HELP"); } | <POST_CMD> <LINE_END> { System.out.println("POST"); } | <GROUP_CMD> t = <GROUP> <LINE_END> { System.out.println("GROUP " + t.image); } | <ARTICLE_CMD> t = <MSG_ID> <LINE_END> { System.out.println("ARTICLE " + t.image.substring(1, t.image.length() - 1)); } } catch (ParseException ex) { System.out.println("Syntax Error!"); do { t = getNextToken(); } while (t.kind != LINE_END); } )* <EOF> } -- Kenneth P. Turvey <kt-usenet@...> http://www.electricsenator.net Ye shall know the truth, and the truth shall make you mad. -- Aldous Huxley --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie questionKennneth P. Turvey wrote:
> On Thu, 14 Aug 2008 18:58:11 -0700, Jonathan Revusky wrote: > >> Kennneth P. Turvey wrote: >>> On Fri, 15 Aug 2008 11:31:14 +1200, J.Chris Findlay wrote: >>> >>>>> Basically I want to throw away tokens until I find a good one, >>>>> END_LINE. >>>> ...which is exactly what that document covers. >>>> >>>> >>> It covers how to handle ParseExceptions in this manner, but there isn't >>> any discussion of handling TokenMgrErrors. The code provided will not >>> work for a TokenMgrError, since getting the next token throws the error >>> again. >>> >>> >> Yes, but what you need is for the TokenMgrError to not be a >> TokenMgrError but instead to be a ParseException. >> >> The way to do that is to create a catch-all Token that matches anything. >> You could put the following after all your other token definitions: >> >> <CATCH_ALL : ~[]> >> >> You have various possibilities from here. You could perhaps handle these >> situations in a CommonTokenAction(Token t) method in your >> TOKEN_MGR_DECLS. >> >> or do a try-catch in your top-level production that, in the catch block, >> scans forward until the next END_LINE is reached. > > Thanks for the info, but I'm still not really sure how to change my code > so that the parser does what I want it to. > > The problem is that by the time the parser has determined that there is > an error, it is already to a the <EOF> that doesn't match. This means > that it has already exited the command loop. This isn't what I want. > > I'll attach my simple grammar below and maybe somebody could help me get > it to work. It took me a while to figure out why your code doesn't work. It really looks like it *should* work... I am wondering whether this could be classified as a bug in JavaCC (and KawaDD, since KawaDD behaves the same in this instance.) If this isn't a bug, it's an extremely unintuitive behavior that should be documented as a gotcha in whatever FAQ list. Anyway, it really seemed to me that the example you've posted below should work. The reason it doesn't is because in the following piece: ( try { <HELP_CMD> <LINE_END> { System.out.println("HELP"); } | <POST_CMD> <LINE_END> { System.out.println("POST"); } | <GROUP_CMD> t = <GROUP> <LINE_END> { System.out.println("GROUP " + t.image); } | <ARTICLE_CMD> t = <MSG_ID> <LINE_END> { System.out.println("ARTICLE " + t.image.substring(1, t.image.length() - 1)); } } catch (ParseException ex) { System.out.println("Syntax Error!"); do { t = getNextToken(); } while (t.kind != LINE_END); } )* if the block within the try doesn't match, it simply jumps out of the (...)* loop and thus never hits the catch block. So it jumps out of the loop and tries to match the EOF at the bottom and, of course, fails, and that's what is throwing the exception. The way to get this to work is to make sure that the expansion within the try block is always matched. The simplest way to do that is to add a JAVACODE production as a final choice at the bottom, and a JAVACODE production is a black box that is considered to always match. So, somewhere put in: JAVACODE void problem() { throw new ParseException("We have a problem here."); } and then add that as a choice in the expansion in the try block. Also, since the block always matches, the EOF at the bottom becomes unreachable, so we can get rid of that. So basically, your start() production ends up looking like: void start() throws IOException : { Token t; } { ( try { <HELP_CMD> <LINE_END> { System.out.println("HELP"); } | <POST_CMD> <LINE_END> { System.out.println("POST"); } | <GROUP_CMD> t = <GROUP> <LINE_END> { System.out.println("GROUP " + t.image); } | <ARTICLE_CMD> t = <MSG_ID> <LINE_END> { System.out.println("ARTICLE " + t.image.substring(1, t.image.length() - 1)); } | problem() } catch (ParseException ex) { System.out.println("Syntax Error!"); do { t = getNextToken(); } while (t.kind != LINE_END || t.kind == EOF); if (t.kind == EOF) { return; } } )* } And this really does seem to work. Like I say, I am wondering whether this is really not a bug in JavaCC and whether what I'm giving you here is really just a workaround. To me, it really seems like if you put the expansion in a try block and it fails to match, it should hit the catch block. IOW, I mean to say, the code you posted *should* work IMHO. Well, I'm going to think about this a bit more. I may patch up KawaDD so that this works more intuitively, and thus, your original code posted does what is expected -- though, OTOH, after further thought, I may decide that the way it works, though anti-intuitive, is actually correct. Doesn't seem that way though. Anyway, meanwhile, you can get what you want by adjusting the expansion in the try block such that it always matches. I hope that's helpful. I spent more time looking into this than I ever would have anticipated. But the side-effect of that is that it brought this rather unintuitive behavior to my attention. And I may well fix that in KawaDD. (Of course, it's safe to say that it will always work this way in JavaCC. :-)) Regards, Jonathan Revusky -- lead developer, FreeMarker project, http://freemarker.org/ KawaDD Parser Generator, http://code.google.com/p/kawadd > > Thanks. > > options { > STATIC = false; > IGNORE_CASE = true; > } > PARSER_BEGIN(NNTPParser) > import java.io.BufferedReader; > import java.io.InputStreamReader; > import java.io.IOException; > > public class NNTPParser { > > public static void main(String[] args) throws > ParseException,IOException { > BufferedReader reader = new BufferedReader(new InputStreamReader > (System.in)); > > NNTPParser parser = new NNTPParser(reader); > parser.start(); > } > > } > PARSER_END(NNTPParser) > > SKIP : { " " | "\t" } > > TOKEN : { > <HELP_CMD: "HELP" > > | <GROUP_CMD: "GROUP" > > | <ARTICLE_CMD: "ARTICLE" > > | <POST_CMD: "POST" > > | <MSG_ID: "<"((["a"-"z","_", "@"])*".")*(["a"-"z"])*">" > > | <GROUP: ((["a"-"z"])*".")*(["a"-"z"])+ > > | <LINE_END: "\n" | "\r\n" > > } > > > void start() throws IOException : > { > Token t; > } > { > ( > try { > <HELP_CMD> <LINE_END> > { System.out.println("HELP"); } > | <POST_CMD> <LINE_END> > { System.out.println("POST"); } > | <GROUP_CMD> t = <GROUP> <LINE_END> > { System.out.println("GROUP " + t.image); } > | <ARTICLE_CMD> t = <MSG_ID> <LINE_END> > { System.out.println("ARTICLE " + t.image.substring(1, > t.image.length() - 1)); } > } > catch (ParseException ex) { > System.out.println("Syntax Error!"); > do { > t = getNextToken(); > } while (t.kind != LINE_END); > } > )* > <EOF> > } > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Newbie question |