Jsoftware
High-Performance Development Platform

canonical pathname

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

canonical pathname

by bill lam-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

suppose I have a filename (assuming always starting but not ending with '/'), like
  '/a/b/c/d/../../e/./f/../../f'

how to remove all /.. and /. so that it reduced to
   '/a/b/f'

regards,

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 7/2/08, bill lam <bbill.lam@...> wrote:
> suppose I have a filename (assuming always starting but not ending with
> '/'), like
>  '/a/b/c/d/../../e/./f/../../f'
>
> how to remove all /.. and /. so that it reduced to
>  '/a/b/f'

require'regex'
clean=: ('//+|/\./|(^|//*)([^/.]|[^/][^/.]|[^/.][^/])+/\.\.(/|$)';'/')&rxrplc^:_

   clean '/a/b/c/d/../../e/./f/../../f'
/a/b/f

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

RE: canonical pathname

by R.E. Boss :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If I remove all /.. and /. (or all '/..' and '/.') from
'/a/b/c/d/../../e/./f/../../f', I am left with '/a/b/c/d/e/f/f'
What is it I do not understand?


R.E. Boss



> -----Oorspronkelijk bericht-----
> Van: programming-bounces@... [mailto:programming-
> bounces@...] Namens bill lam
> Verzonden: woensdag 2 juli 2008 18:29
> Aan: Programming forum
> Onderwerp: [Jprogramming] canonical pathname
>
> suppose I have a filename (assuming always starting but not ending with
> '/'), like
>   '/a/b/c/d/../../e/./f/../../f'
>
> how to remove all /.. and /. so that it reduced to
>    '/a/b/f'
>
> regards,
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 7/2/08, I wrote:
> clean=: ('//+|/\./|(^|//*)([^/.]|[^/][^/.]|[^/.][^/])+/\.\.(/|$)';'/')&rxrplc^:_

That was not correct
clean=: ('//+|/\./|(^|//*)([^/.]|[^/][^/.]|[^/.][^/])+/\.\.(/|$)';'/')&rxrplc^:_

   clean 'abc/../..'
/..

The answer should have been '..'

This should function properly (but watch out for email induced
line wrap):

cln=: ('//+|/\./';'/')&rxrplc@(('(^|(?<=/))([^/.]|[^/][^/.]|[^/.][^/])+/\.\.(/|$)';'')&rxrplc)^:_


--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 7/2/08, R.E. Boss <r.e.boss@...> wrote:
> If I remove all /.. and /. (or all '/..' and '/.') from
> '/a/b/c/d/../../e/./f/../../f', I am left with '/a/b/c/d/e/f/f'
> What is it I do not understand?

These strings represent file system paths, where .
represents the current directory and .. represents
the current directory's parent directory.

FYI,

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Oleg Kobchenko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

   load'regex'

   A=: '/a/b/c/d/../../e/./f/../../f'

   ('/[^/]+/\.\.';'') rxrplc^:_ ('/\./';'/') rxrplc A
/a/b/f
   


--- On Wed, 7/2/08, bill lam <bbill.lam@...> wrote:

> suppose I have a filename (assuming always starting but not
> ending with '/'), like
>   '/a/b/c/d/../../e/./f/../../f'
>
> how to remove all /.. and /. so that it reduced to
>    '/a/b/f'
>
> regards,



     
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 7/2/08, Oleg Kobchenko <olegykj@...> wrote:
>   load'regex'
>   A=: '/a/b/c/d/../../e/./f/../../f'
>   ('/[^/]+/\.\.';'') rxrplc^:_ ('/\./';'/') rxrplc A
> /a/b/f

That does work but beware that there are
cases that this does not deal with.  In particular
repeated slashes and non-absolute paths may
cause problems.

Of course, those were not a part of the specification,
so maybe they should be ignored?

But also .. as prefix of some longer name will also
cause problems, and also a trailing /. will not be removed.

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Oleg Kobchenko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I know. It's just an idea. However, I believe canonicalization
should be possible with E. split and reassembling.


--- On Wed, 7/2/08, Raul Miller <rauldmiller@...> wrote:

> On 7/2/08, Oleg Kobchenko <olegykj@...> wrote:
> >   load'regex'
> >   A=: '/a/b/c/d/../../e/./f/../../f'
> >   ('/[^/]+/\.\.';'') rxrplc^:_
> ('/\./';'/') rxrplc A
> > /a/b/f
>
> That does work but beware that there are
> cases that this does not deal with.  In particular
> repeated slashes and non-absolute paths may
> cause problems.
>
> Of course, those were not a part of the specification,
> so maybe they should be ignored?
>
> But also .. as prefix of some longer name will also
> cause problems, and also a trailing /. will not be removed.
>
> --
> Raul
> ----------------------------------------------------------------------
> For information about J forums see
> http://www.jsoftware.com/forums.htm


     
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 7/2/08, Raul Miller <rauldmiller@...> wrote:
> But also .. as prefix of some longer name will also
> cause problems, and also a trailing /. will not be removed.

   ('/[^/]+/\.\.';'') rxrplc^:_ ('/\./';'/') rxrplc '/a/.../././././.'
./././.

Since my earlier proposals had problems here, I propose:

require'regex'
cln1=: ('(?<=.)//+';'/')&rxrplc
cln2=: ('(^|(?<=/))\./|/\.((?=/)|$)';'')&rxrplc
cln3=: ('(^|(?<=/))([^/.]|[^/][^/.]|[^/.][^/])+/\.\.((?=/)|$)';'.')&rxrplc
cleaner=: cln3@cln2@cln1^:_

   cleaner '/a/b/c/d/../../e/./f/../../f'
/a/b/f
   cleaner 'abc/../..'
..
   cleaner '/a/.../././././.'
/a/...

--
Raul



--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 7/2/08, Oleg Kobchenko <olegykj@...> wrote:
> I know. It's just an idea. However, I believe canonicalization
> should be possible with E. split and reassembling.

That's a good point.

And my rxrplc solutions would mess up unc paths.

So, perhaps:

pfx=: '/' #~ 2 <. i.&0@:=&'/'
trim1=: (<,'.') -.~ a: -.~ '/' <;._1@, ]
trim2=: (#~ 0 1 (*. _1&|.)@:-.@E. (<'..')&=)^:_
trim=: pfx, [:}.@; '/',L:0 trim2@trim1

   trim '/a/b/c/d/../../e/./f/../../f'
/a/b/f
   trim 'abc/../..'
..
   trim '/a/.../././././.'
/a/...
   trim '//unc/paths/should/../must/remain/valid'
//unc/paths/must/remain/valid

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

RE: canonical pathname

by R.E. Boss :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks, but even with this interpretation, I still do not understand how
removing all /.. and /. from '/a/b/c/d/../../e/./f/../../f' results in
'/a/b/f'


R.E. Boss


> -----Oorspronkelijk bericht-----
> Van: programming-bounces@... [mailto:programming-
> bounces@...] Namens Raul Miller
> Verzonden: woensdag 2 juli 2008 20:53
> Aan: Programming forum
> Onderwerp: Re: [Jprogramming] canonical pathname
>
> On 7/2/08, R.E. Boss <r.e.boss@...> wrote:
> > If I remove all /.. and /. (or all '/..' and '/.') from
> > '/a/b/c/d/../../e/./f/../../f', I am left with '/a/b/c/d/e/f/f'
> > What is it I do not understand?
>
> These strings represent file system paths, where .
> represents the current directory and .. represents
> the current directory's parent directory.
>
> FYI,
>
> --
> Raul
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Robert Raschke-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Please also be aware that file systems are not trees. They're graphs,
due to hard and soft links. Thus /a/b/c/../d is not necessarily the
same as /a/b/d, because c may have redirected you to a completely
different part of your file system.

Path rewriting like this is pretty much impossible to get right in the
general case. But if you know your application, then it may work for
you.

See also http://plan9.bell-labs.com/sys/doc/lexnames.pdf for more
detailed and technical info.

Robby
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Rob Hodgkinson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>From Raul ... (and being aware of the redirection problem Robert Raschke
raised) ...

.. Means the parent directory (one Œabove¹ the current directory
.  Means the current directory, so you can basically ignore these ones ...

Thus stepwise to follow the filename parsing ...

'/a/b/c/d/../../e/./f/../../f'
                 ^^             ignore the /. So this becomes ...
Œ/a/b/c/d/../../e/f/../../f¹
       ^^^^^                    the /.. Moves back up to the /d directory
level, so ...
Œ/a/b/c/../e/f/../../f¹
     ^^^^^                      ditto for /.. So then becomes ...
Œ/a/b/e/f/../../f¹
       ^^^^^                    ditto for /.. So then becomes ...
Œ/a/b/e/../f¹
     ^^^^^                      ditto for /.. So then becomes ...
Œ/a/b/f¹

So you don¹t ³remove² them, but ³parse² them to determine the ultimate
filename.

The Œlevels¹ of the directory are more obvious if you do ...
   ]boxes=:(<'/.')-.~ <;.1 '/a/b/c/d/../../e/./f/../../f'
+--+--+--+--+---+---+--+--+---+---+--+
|/a|/b|/c|/d|/..|/..|/e|/f|/..|/..|/f|
+--+--+--+--+---+---+--+--+---+---+--+
   (<'/..')=boxes
0 0 0 0 1 1 0 0 1 1 0
   1+_2*(<'/..')=boxes
1 1 1 1 _1 _1 1 1 _1 _1 1
   boxes,:<"0 +/\1+_2*(<'/..')=boxes
+--+--+--+--+---+---+--+--+---+---+--+
|/a|/b|/c|/d|/..|/..|/e|/f|/..|/..|/f|
+--+--+--+--+---+---+--+--+---+---+--+
|1 |2 |3 |4 |3  |2  |3 |4 |3  |2  |3 |
+--+--+--+--+---+---+--+--+---+---+--+

The last (key) step would be to keep the first Œunclosed¹ occurrence at each
depth level, which here would return the first, second and last to produce
the result Œ/a/b/f¹.  A little tricky perhaps, so the regex solution is more
Œready made¹.

Hope that makes it clearer (and that I didn¹t goof !).

Rob Hodgkinson


On 3/07/08 5:07 PM, "R.E. Boss" <r.e.boss@...> wrote:

> Thanks, but even with this interpretation, I still do not understand how
> removing all /.. and /. from '/a/b/c/d/../../e/./f/../../f' results in
> '/a/b/f'
>
>
> R.E. Boss
>
>
>> > -----Oorspronkelijk bericht-----
>> > Van: programming-bounces@... [mailto:programming-
>> > bounces@...] Namens Raul Miller
>> > Verzonden: woensdag 2 juli 2008 20:53
>> > Aan: Programming forum
>> > Onderwerp: Re: [Jprogramming] canonical pathname
>> >
>> > On 7/2/08, R.E. Boss <r.e.boss@...> wrote:
>>> > > If I remove all /.. and /. (or all '/..' and '/.') from
>>> > > '/a/b/c/d/../../e/./f/../../f', I am left with '/a/b/c/d/e/f/f'
>>> > > What is it I do not understand?
>> >
>> > These strings represent file system paths, where .
>> > represents the current directory and .. represents
>> > the current directory's parent directory.
>> >
>> > FYI,
>> >
>> > --
>> > Raul
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

RE: canonical pathname

by R.E. Boss :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks very much, much clearer indeed.

So if I understand it correctly it is:
 remove /. for all occurrences of /.
 remove x/.. for all occurrences of /..
 
And if x is empty, just remove /..?


R.E. Boss


> -----Oorspronkelijk bericht-----
> Van: programming-bounces@... [mailto:programming-
> bounces@...] Namens Rob Hodgkinson
> Verzonden: donderdag 3 juli 2008 13:27
> Aan: Programming forum
> Onderwerp: Re: [Jprogramming] canonical pathname
>
> >From Raul ... (and being aware of the redirection problem Robert Raschke
> raised) ...
>
> .. Means the parent directory (one Œabove¹ the current directory
> .  Means the current directory, so you can basically ignore these ones ...
>
> Thus stepwise to follow the filename parsing ...
>
> '/a/b/c/d/../../e/./f/../../f'
>                  ^^             ignore the /. So this becomes ...
> Œ/a/b/c/d/../../e/f/../../f¹
>        ^^^^^                    the /.. Moves back up to the /d directory
> level, so ...
> Œ/a/b/c/../e/f/../../f¹
>      ^^^^^                      ditto for /.. So then becomes ...
> Œ/a/b/e/f/../../f¹
>        ^^^^^                    ditto for /.. So then becomes ...
> Œ/a/b/e/../f¹
>      ^^^^^                      ditto for /.. So then becomes ...
> Œ/a/b/f¹
>
> So you don¹t ³remove² them, but ³parse² them to determine the ultimate
> filename.
>
> The Œlevels¹ of the directory are more obvious if you do ...
>    ]boxes=:(<'/.')-.~ <;.1 '/a/b/c/d/../../e/./f/../../f'
> +--+--+--+--+---+---+--+--+---+---+--+
> |/a|/b|/c|/d|/..|/..|/e|/f|/..|/..|/f|
> +--+--+--+--+---+---+--+--+---+---+--+
>    (<'/..')=boxes
> 0 0 0 0 1 1 0 0 1 1 0
>    1+_2*(<'/..')=boxes
> 1 1 1 1 _1 _1 1 1 _1 _1 1
>    boxes,:<"0 +/\1+_2*(<'/..')=boxes
> +--+--+--+--+---+---+--+--+---+---+--+
> |/a|/b|/c|/d|/..|/..|/e|/f|/..|/..|/f|
> +--+--+--+--+---+---+--+--+---+---+--+
> |1 |2 |3 |4 |3  |2  |3 |4 |3  |2  |3 |
> +--+--+--+--+---+---+--+--+---+---+--+
>
> The last (key) step would be to keep the first Œunclosed¹ occurrence at
> each
> depth level, which here would return the first, second and last to produce
> the result Œ/a/b/f¹.  A little tricky perhaps, so the regex solution is
> more
> Œready made¹.
>
> Hope that makes it clearer (and that I didn¹t goof !).
>
> Rob Hodgkinson
>
>
> On 3/07/08 5:07 PM, "R.E. Boss" <r.e.boss@...> wrote:
>
> > Thanks, but even with this interpretation, I still do not understand how
> > removing all /.. and /. from '/a/b/c/d/../../e/./f/../../f' results in
> > '/a/b/f'
> >
> >
> > R.E. Boss
> >
> >
> >> > -----Oorspronkelijk bericht-----
> >> > Van: programming-bounces@... [mailto:programming-
> >> > bounces@...] Namens Raul Miller
> >> > Verzonden: woensdag 2 juli 2008 20:53
> >> > Aan: Programming forum
> >> > Onderwerp: Re: [Jprogramming] canonical pathname
> >> >
> >> > On 7/2/08, R.E. Boss <r.e.boss@...> wrote:
> >>> > > If I remove all /.. and /. (or all '/..' and '/.') from
> >>> > > '/a/b/c/d/../../e/./f/../../f', I am left with '/a/b/c/d/e/f/f'
> >>> > > What is it I do not understand?
> >> >
> >> > These strings represent file system paths, where .
> >> > represents the current directory and .. represents
> >> > the current directory's parent directory.
> >> >
> >> > FYI,
> >> >
> >> > --
> >> > Raul
> >> > ---------------------------------------------------------------------
> -
> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 7/3/08, R.E. Boss <r.e.boss@...> wrote:
> And if x is empty, just remove /..?

If x is empty then /.. should probably be replaced with /

However .. without a preceeding slash should be left
alone.

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 7/3/08, Robert Raschke <rtrlists@...> wrote:
> Please also be aware that file systems are not trees. They're graphs,
> due to hard and soft links. Thus /a/b/c/../d is not necessarily the
> same as /a/b/d, because c may have redirected you to a completely
> different part of your file system.

This is true, but if you want that functionality you should probably
not reduce the path to cannonical form.

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by bill lam-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raul Miller wrote:

> On 7/2/08, Oleg Kobchenko <olegykj@...> wrote:
>> I know. It's just an idea. However, I believe canonicalization
>> should be possible with E. split and reassembling.
>
> That's a good point.
>
> And my rxrplc solutions would mess up unc paths.
>
> So, perhaps:
>
> pfx=: '/' #~ 2 <. i.&0@:=&'/'
> trim1=: (<,'.') -.~ a: -.~ '/' <;._1@, ]
> trim2=: (#~ 0 1 (*. _1&|.)@:-.@E. (<'..')&=)^:_
> trim=: pfx, [:}.@; '/',L:0 trim2@trim1
>
>    trim '/a/b/c/d/../../e/./f/../../f'
> /a/b/f
>    trim 'abc/../..'
> ..
>    trim '/a/.../././././.'
> /a/...
>    trim '//unc/paths/should/../must/remain/valid'
> //unc/paths/must/remain/valid
>

Thank all for providing suggestion. It's good to see what how it works
alternatively.  My approach was to tokenise by <;.1, then push everything into
stack except
a. discard token '/.'
b. pop stack for token '/..'

In linux, /.. or /. is just /, The name 'abspath' is used in perl or python.

regards,
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Oleg Kobchenko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The stack idea is interesting. Here's a tacit stack version:

   A=: '/a/b/c/d/../../e/./f/../../f'

   if=: @.
   else=: `
   eq=: 1 : '(<,m) = ['
   
   ; '/'&,each ,else(}.@])if('..'eq)else]if('.'eq)/ &.|. <;._1 A
/a/b/f


--- On Thu, 7/3/08, bill lam <bbill.lam@...> wrote:

Raul Miller wrote:

> On 7/2/08, Oleg Kobchenko <olegykj@...> wrote:
>> I know. It's just an idea. However, I believe canonicalization
>> should be possible with E. split and reassembling.
>
> That's a good point.
>
> And my rxrplc solutions would mess up unc paths.
>
> So, perhaps:
>
> pfx=: '/' #~ 2 <. i.&0@:=&'/'
> trim1=: (<,'.') -.~ a: -.~ '/' <;._1@, ]
> trim2=: (#~ 0 1 (*. _1&|.)@:-.@E. (<'..')&=)^:_
> trim=: pfx, [:}.@; '/',L:0 trim2@trim1
>
>    trim '/a/b/c/d/../../e/./f/../../f'
> /a/b/f
>    trim 'abc/../..'
> ..
>    trim '/a/.../././././.'
> /a/...
>    trim '//unc/paths/should/../must/remain/valid'
> //unc/paths/must/remain/valid
>

Thank all for providing suggestion. It's good to see what how it works
alternatively.  My approach was to tokenise by <;.1, then push everything into
stack except
a. discard token '/.'
b. pop stack for token '/..'

In linux, /.. or /. is just /, The name 'abspath' is used in perl or python.



     
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: canonical pathname

by Oleg Kobchenko :: Rate this Message: