Why RE are singleline?

View: New views
7 Messages — Rating Filter:   Alert me  

Why RE are singleline?

by Dmitry Ivankov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Why can't I write expression like ";[ \t\r\n]*;"?

Syntax I want to parse is sequence of quoted strings, separated by spaces, tabs and newlines, prefixed by a special char.
For example $"abc" "def" "ghi";  or @"123" "456" "789";
And for different prefixes I want different schemes for strings.

Without newlines it's simple
block:
 start: $\"
 end: \"
 scheme = S1
S1:
 regexp: \"[ \t]*\" //just eat that sequence
 //and of course other rules, concerning content of strings

Block can't be used instead of regexp, because it's start will be matched with any quote.



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
colorer-talks mailing list
colorer-talks@...
https://lists.sourceforge.net/lists/listinfo/colorer-talks

Re: Why RE are singleline?

by Igor Russkih :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Not sure why you need this in this particular case.

HRC model is 'positive' parsing (in contrast with BNF and other forms
of grammar description). you don't have to eat all the characters -
just the things you need. In case your syntax doesn't allow anything,
except 'spaces' between your strings - you can 'tell' the user this:
just use
<regexp match="/\S/" region="def:Error"/>
at the end of your top-level scheme,

And user will see all the bad characters "between" your strings.

Believe in your case you don't need [ \t]* to be tokenized. You can
just leave your 'S1' scheme with rules, concerning that string's
content.

As for the regexp single line limitation, it has its own roots.
Indeed, sometimes it limits expression power, however it allows HRC to
parse code rather fast - in other case many of "free" HRC
constructions would give exponential time in scope of overall file
content.



On 3/1/07, Dmitry Ivankov <divanorama@...> wrote:

> Why can't I write expression like ";[ \t\r\n]*;"?
>
> Syntax I want to parse
> is sequence of quoted strings, separated by spaces, tabs and newlines,
> prefixed by a special char.
> For example $"abc" "def" "ghi";  or @"123" "456" "789";
> And for different prefixes I want different schemes for strings.
>
> Without newlines it's simple
> block:
>  start: $\"
>  end: \"
>  scheme = S1
> S1:
>  regexp: \"[ \t]*\" //just eat that sequence
>  //and of course other rules, concerning content of strings
>
> Block can't be
> used instead of regexp, because it's start will be matched with any quote.
>


--
  Igor

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
colorer-talks mailing list
colorer-talks@...
https://lists.sourceforge.net/lists/listinfo/colorer-talks

Re: Why RE are singleline?

by Dmitry Ivankov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


HRC model is 'positive' parsing (in contrast with BNF and other forms
of grammar description). you don't have to eat all the characters -
just the things you need.
Yes, i know. It's positiveness is good thing, because looking at hrc i see the parsing algorithm.
But now i see algo which is in hrc style, but can't express it in hrc ;)

In case your syntax doesn't allow anything,
except 'spaces' between your strings...
Yes and no. Syntax of these sequences doesn't, but top level syntax allows many things, for example $"ab" "c"; "def"; @"ghi" should be parsed as S1(both strings); S0; S2
 
>Believe in your case you don't need [ \t]* to be tokenized. You can
>just leave your 'S1' scheme with rules, concerning that string's
>content.
In that case S1 will end at first ", and $"abc" "def" will be S1 S0, not S1(2 strings)
By tokenizing separator i extend S1 to next string in sequence if there is next string.
I still haven't found another way to extend S1. Because if it is some hacky block, it's start is surely " and end is " too, but once " is matched it can't rollback if end is not found (or there were bad symbols before end).

>As for the regexp single line limitation, it has its own roots.
>Indeed, sometimes it limits expression power, however it allows HRC to
>parse code rather fast - in other case many of "free" HRC
>constructions would give exponential time in scope of overall file
>content.
Speed is critical, but limited usage of "good" multiline regexp wont affect it. (In my case those RE wont analyze any symbol more than once). I think that just twoline (\n isn't used with in * or + operators, and appears only once) will cover most of constructions that require multiline RE, and won't give significant slowdown. (just like we'll have lines twice longer than usual)
In my case twoline will be enough. (there is no need in separating sequence by blank lines)

Or there is some other way? :)

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
colorer-talks mailing list
colorer-talks@...
https://lists.sourceforge.net/lists/listinfo/colorer-talks

Re: Why RE are singleline?

by Igor Russkih :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > In case your syntax doesn't allow anything,
> > except 'spaces' between your strings...
> Yes and no. Syntax of these sequences doesn't, but top level syntax allows
> many things, for example $"ab" "c"; "def"; @"ghi" should be parsed as
> S1(both strings); S0; S2
Ok, begining to understand what you are trying to express ;-)

> By tokenizing separator i extend S1 to next string in sequence if there is
> next string.
And possibly this is the wrong thing - tokenizing separator. Lets see
what we can do here.

> I still haven't found another way to extend S1. Because if it is some hacky
> block, it's start is surely " and end is " too, but once " is matched it
> can't rollback if end is not found (or there were bad symbols before end).
got it.

> In my case twoline will be enough. (there is no need in separating sequence
> by blank lines)
I really can't understand now why 'two line' will be enough? Do you
mean Your syntax allows
-------------
@"foo"
"bar";
-------------
But doesnt allows
-------------
@"foo"

"bar";
-------------
?
I believe don't.

>
> Or there is some other way? :)
HRC is still context-free like grammar language. To express the things
you have to find recursive constructions which doesn't need to be
rolled back.
I see that all prefixed string ends with ';'. Can it be used to match
end of your string?

scheme top
    block
        \@
        \;
        scheme at_string

scheme at_string
    block
         "
         "
         scheme string_content
    re /./
         def:Error
         priority=low

Can the above fit your needs?

If no, I definitely need your language name and some concrete source
code samples ;)

The thing I often see with HRC is that if language is difficult of
impossible to express in HRC then this means that this language's
grammar is poorly designed and it requires heavy resources even to
compile/interpret it...

--
  Igor

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
colorer-talks mailing list
colorer-talks@...
https://lists.sourceforge.net/lists/listinfo/colorer-talks

Re: Why RE are singleline?

by Dmitry Ivankov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


I really can't understand now why 'two line' will be enough?
Because sequence of strings is used mostly to split long string into lines.
Just like c/c++ consecutive string literals are concatenated.
So $"long line" can be written as $"long"\n"line"
Twoline is enough or better to say acceptable for most texts. $"q"\n\n"w" is legal and is S1(2 strings), but it will be ok if it's parsed as S1\n\nS0.


I see that all prefixed string ends with ';'. Can it be used to match
end of your string?
No it can end with any other separator, maybe [;,.)+-/........], but it's a bad idea
 
If no, I definitely need your language name and some concrete source
code samples ;)
Language is nemerle. Syntax is similar to c/c++/c#.
You can imagine syntax as c++ with additional construction (it's not true but it describes problem):
- $"the value of x is $x\n" "and y is $y."\n "and of course z = $z"
(sequence of strings prefixed with "$" has separate grammar, i.e. $x, $y, $z are highlighted, at least it would be cool :) )
(if it's not prefixed then it is just many strings, with escape sequences highlighted in each)

Or more simple syntax - just c++. 
But with paired quotes. And in case of "str1"\n "str2" 1st and 4rd quotes are paired, 2d and 3d are just quotes, highlighted differenty from strings' content.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
colorer-talks mailing list
colorer-talks@...
https://lists.sourceforge.net/lists/listinfo/colorer-talks

Re: Why RE are singleline?

by Dmitry Ivankov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


I see that all prefixed string ends with ';'. Can it be used to match
end of your string?
I've just realized that sequence ends with..... whatever :))
So following seems to work:
in top scheme:
block:
 start: \$\M\"
 end: \M.
 scheme: strings

scheme strings:
 re: [ \t]
 block:
  start: \"
  end: \"
  scheme: inside_string

Hacks rock :)

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
colorer-talks mailing list
colorer-talks@...
https://lists.sourceforge.net/lists/listinfo/colorer-talks

Re: Why RE are singleline?

by Igor Russkih :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Wow, thats really cool.

The only minor possible drawback here is that this block will lasts
until that 'whatever'. F.e. in case you want to highlight background
color of such strings you'll see it not until the last " quote but
alittle bit further.


On 3/2/07, Dmitry Ivankov <divanorama@...> wrote:

>
> >
> > I see that all prefixed string ends with ';'. Can it be used to match
> > end of your string?
> >
> I've just realized that sequence ends with..... whatever :))
> So following seems to work:
> in top scheme:
> block:
>  start: \$\M\"
>  end: \M.
>   scheme: strings
>
> scheme strings:
>  re: [ \t]
>  block:
>   start: \"
>   end: \"
>   scheme: inside_string
>
> Hacks rock :)
>
>

--
  Igor

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
colorer-talks mailing list
colorer-talks@...
https://lists.sourceforge.net/lists/listinfo/colorer-talks