|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Why RE are singleline?Why can't I write expression like ";[ \t\r\n]*;"?
Syntax I want to parse is sequence of quoted strings, separated by spaces, tabs and newlines, prefixed by a special char. For example $"abc" "def" "ghi"; or @"123" "456" "789"; And for different prefixes I want different schemes for strings. Without newlines it's simple block: start: $\" end: \" scheme = S1 S1: regexp: \"[ \t]*\" //just eat that sequence //and of course other rules, concerning content of strings Block can't be used instead of regexp, because it's start will be matched with any quote. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ colorer-talks mailing list colorer-talks@... https://lists.sourceforge.net/lists/listinfo/colorer-talks |
|
|
Re: Why RE are singleline?Not sure why you need this in this particular case.
HRC model is 'positive' parsing (in contrast with BNF and other forms of grammar description). you don't have to eat all the characters - just the things you need. In case your syntax doesn't allow anything, except 'spaces' between your strings - you can 'tell' the user this: just use <regexp match="/\S/" region="def:Error"/> at the end of your top-level scheme, And user will see all the bad characters "between" your strings. Believe in your case you don't need [ \t]* to be tokenized. You can just leave your 'S1' scheme with rules, concerning that string's content. As for the regexp single line limitation, it has its own roots. Indeed, sometimes it limits expression power, however it allows HRC to parse code rather fast - in other case many of "free" HRC constructions would give exponential time in scope of overall file content. On 3/1/07, Dmitry Ivankov <divanorama@...> wrote: > Why can't I write expression like ";[ \t\r\n]*;"? > > Syntax I want to parse > is sequence of quoted strings, separated by spaces, tabs and newlines, > prefixed by a special char. > For example $"abc" "def" "ghi"; or @"123" "456" "789"; > And for different prefixes I want different schemes for strings. > > Without newlines it's simple > block: > start: $\" > end: \" > scheme = S1 > S1: > regexp: \"[ \t]*\" //just eat that sequence > //and of course other rules, concerning content of strings > > Block can't be > used instead of regexp, because it's start will be matched with any quote. > -- Igor ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ colorer-talks mailing list colorer-talks@... https://lists.sourceforge.net/lists/listinfo/colorer-talks |
|
|
Re: Why RE are singleline?HRC model is 'positive' parsing (in contrast with BNF and other forms Yes, i know. It's positiveness is good thing, because looking at hrc i see the parsing algorithm.
But now i see algo which is in hrc style, but can't express it in hrc ;) In case your syntax doesn't allow anything, Yes and no. Syntax of these sequences doesn't, but top level syntax allows many things, for example $"ab" "c"; "def"; @"ghi" should be parsed as S1(both strings); S0; S2
>Believe in your case you don't need [ \t]* to be tokenized. You can>just leave your 'S1' scheme with rules, concerning that string's >content. In that case S1 will end at first ", and $"abc" "def" will be S1 S0, not S1(2 strings) By tokenizing separator i extend S1 to next string in sequence if there is next string. I still haven't found another way to extend S1. Because if it is some hacky block, it's start is surely " and end is " too, but once " is matched it can't rollback if end is not found (or there were bad symbols before end). >As for the regexp single line limitation, it has its own roots. >Indeed, sometimes it limits expression power, however it allows HRC to >parse code rather fast - in other case many of "free" HRC >constructions would give exponential time in scope of overall file >content. Speed is critical, but limited usage of "good" multiline regexp wont affect it. (In my case those RE wont analyze any symbol more than once). I think that just twoline (\n isn't used with in * or + operators, and appears only once) will cover most of constructions that require multiline RE, and won't give significant slowdown. (just like we'll have lines twice longer than usual) In my case twoline will be enough. (there is no need in separating sequence by blank lines) Or there is some other way? :) ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ colorer-talks mailing list colorer-talks@... https://lists.sourceforge.net/lists/listinfo/colorer-talks |
|
|
Re: Why RE are singleline?> > In case your syntax doesn't allow anything,
> > except 'spaces' between your strings... > Yes and no. Syntax of these sequences doesn't, but top level syntax allows > many things, for example $"ab" "c"; "def"; @"ghi" should be parsed as > S1(both strings); S0; S2 Ok, begining to understand what you are trying to express ;-) > By tokenizing separator i extend S1 to next string in sequence if there is > next string. And possibly this is the wrong thing - tokenizing separator. Lets see what we can do here. > I still haven't found another way to extend S1. Because if it is some hacky > block, it's start is surely " and end is " too, but once " is matched it > can't rollback if end is not found (or there were bad symbols before end). got it. > In my case twoline will be enough. (there is no need in separating sequence > by blank lines) I really can't understand now why 'two line' will be enough? Do you mean Your syntax allows ------------- @"foo" "bar"; ------------- But doesnt allows ------------- @"foo" "bar"; ------------- ? I believe don't. > > Or there is some other way? :) HRC is still context-free like grammar language. To express the things you have to find recursive constructions which doesn't need to be rolled back. I see that all prefixed string ends with ';'. Can it be used to match end of your string? scheme top block \@ \; scheme at_string scheme at_string block " " scheme string_content re /./ def:Error priority=low Can the above fit your needs? If no, I definitely need your language name and some concrete source code samples ;) The thing I often see with HRC is that if language is difficult of impossible to express in HRC then this means that this language's grammar is poorly designed and it requires heavy resources even to compile/interpret it... -- Igor ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ colorer-talks mailing list colorer-talks@... https://lists.sourceforge.net/lists/listinfo/colorer-talks |
|
|
Re: Why RE are singleline?
Because sequence of strings is used mostly to split long string into lines. Just like c/c++ consecutive string literals are concatenated. So $"long line" can be written as $"long"\n"line" Twoline is enough or better to say acceptable for most texts. $"q"\n\n"w" is legal and is S1(2 strings), but it will be ok if it's parsed as S1\n\nS0.
No it can end with any other separator, maybe [;,.)+-/........], but it's a bad idea
If no, I definitely need your language name and some concrete source Language is nemerle. Syntax is similar to c/c++/c#. You can imagine syntax as c++ with additional construction (it's not true but it describes problem): - $"the value of x is $x\n" "and y is $y."\n "and of course z = $z" (sequence of strings prefixed with "$" has separate grammar, i.e. $x, $y, $z are highlighted, at least it would be cool :) ) (if it's not prefixed then it is just many strings, with escape sequences highlighted in each) But with paired quotes. And in case of "str1"\n "str2" 1st and 4rd quotes are paired, 2d and 3d are just quotes, highlighted differenty from strings' content. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ colorer-talks mailing list colorer-talks@... https://lists.sourceforge.net/lists/listinfo/colorer-talks |
|
|
Re: Why RE are singleline?
So following seems to work: in top scheme: block: start: \$\M\" end: \M. scheme: strings scheme strings: re: [ \t] block: start: \" end: \" scheme: inside_string Hacks rock :) ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ colorer-talks mailing list colorer-talks@... https://lists.sourceforge.net/lists/listinfo/colorer-talks |
|
|
Re: Why RE are singleline?Wow, thats really cool.
The only minor possible drawback here is that this block will lasts until that 'whatever'. F.e. in case you want to highlight background color of such strings you'll see it not until the last " quote but alittle bit further. On 3/2/07, Dmitry Ivankov <divanorama@...> wrote: > > > > > I see that all prefixed string ends with ';'. Can it be used to match > > end of your string? > > > I've just realized that sequence ends with..... whatever :)) > So following seems to work: > in top scheme: > block: > start: \$\M\" > end: \M. > scheme: strings > > scheme strings: > re: [ \t] > block: > start: \" > end: \" > scheme: inside_string > > Hacks rock :) > > -- Igor ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ colorer-talks mailing list colorer-talks@... https://lists.sourceforge.net/lists/listinfo/colorer-talks |
| Free Forum Powered by Nabble | Forum Help |