Jsoftware
High-Performance Development Platform

The unifying generalization of derivatives

View: New views
18 Messages — Rating Filter:   Alert me  

The unifying generalization of derivatives

by Tracy Harms-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I found a series of blog postings by Conal Elliott on the
generalization of derivatives.  Others may also enjoy them, so here's
the URI:

http://conal.net/blog/

Perhaps most interesting is the one entitled "Higher-dimensional,
higher-order derivatives, functionally".  In that posting the chain
rule is given by the following expression:

   deriv (f . g) x = deriv f (g x) . deriv g x

I'm confused by the use of composition on the right-hand side, as I
expect it to involve two functions. I can't see a composition of two
functions here; in J terms, I see nouns where I expect verbs.
Guidance in correcting my misinterpretation is welcome.

Tracy
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by John Randall-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tracy Harms wrote:
>  In that posting the chain
> rule is given by the following expression:
>
>    deriv (f . g) x = deriv f (g x) . deriv g x
>
> I'm confused by the use of composition on the right-hand side, as I
> expect it to involve two functions. I can't see a composition of two
> functions here; in J terms, I see nouns where I expect verbs.
> Guidance in correcting my misinterpretation is welcome.

The problem is that if f is a function of one variable, f'(17) is usually
interpreted as a number.  To generalize it, you have to think about it as
a linear map.  When we write f'(17)=4, we really mean f'(17) is the linear
map L:R->R such that L(x)=4x.  With this interpretation, composition in
the 1-dimensional case is the same as multiplication.

More generally, if f:R^n -> R^m, its derivative at a point is a linear map
from R^n->R^m, and can be interpreted as an m x n matrix.

Best wishes,

John


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Parent Message unknown Re: The unifying generalization of derivatives

by Oleg Kobchenko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message





----- Original Message ----

> From: Tracy Harms <kaleidic@...>
> To: chat@...
> Sent: Tuesday, September 16, 2008 11:23:24 AM
> Subject: [Jchat] The unifying generalization of derivatives
>
> I found a series of blog postings by Conal Elliott on the
> generalization of derivatives.  Others may also enjoy them, so here's
> the URI:
>
> http://conal.net/blog/
>
> Perhaps most interesting is the one entitled "Higher-dimensional,
> higher-order derivatives, functionally".  In that posting the chain
> rule is given by the following expression:
>
>    deriv (f . g) x = deriv f (g x) . deriv g x
>

Incorrect form: the first "." is composition, the second
should be multiplication. The composition on the right
side should be between f' and first g.

> I'm confused by the use of composition on the right-hand side, as I
> expect it to involve two functions. I can't see a composition of two
> functions here; in J terms, I see nouns where I expect verbs.
> Guidance in correcting my misinterpretation is welcome.
>
> Tracy
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm



     
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Sep 16, 2008 at 11:23 AM, Tracy Harms <kaleidic@...> wrote:
>   deriv (f . g) x = deriv f (g x) . deriv g x
>
> I'm confused by the use of composition on the right-hand side, as I
> expect it to involve two functions. I can't see a composition of two
> functions here; in J terms, I see nouns where I expect verbs.
> Guidance in correcting my misinterpretation is welcome.

In J terms, dot would be a conjunction (though you would
need to rearrange that sentence so J could parse it
properly):

When dot's arguments are both nouns, dot produces a new
noun using multiplication.  (Hypothetically speaking, if neither
of those nouns was rank zero, dot would probably use
+/ .*).

When dot's arguments are both verbs, dot produces a new
verb using function composition.

Note that this approach implies that the result of deriv is
a noun.  (However, properly speaking, I think that it should
be a limit condition.)

At least, that's how I would interpret that notation.  (But I did
not properly understand John Randall's contribution to this thread).

FYI,

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by Dan Bron :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tracy wrote:
>  the chain rule is given by the following expression:
>     deriv (f . g) x = deriv f (g x) . deriv g x
 
Oleg corrected the blog author:
>  Incorrect form: the first "." is composition, the second
>  should be multiplication. The composition on the right
>  side should be between f' and first g.

To clarify using J notation:

           chain =: 2 : 'u@:v D.1 -: u D.1 @: v +/ .* v D.1'
           
The conjunction  chain  takes two verb (function) arguments and produces a tautology (i.e. a verb that produces  1  for any input).  At least in theory.

           1&o. chain (3 2&p.) _78 0.9 1p1 1j5
        1

-Dan

PS:  Using J4 notation, before the trimming of Section II.F:


           chain  =:  @:D.1 -: ([.D.1 @:]. (+/ .*) (].D.1))
           
I'm particularly enamored of the  @:D.1  part.  This is pure definition:  the first derivative of the composition.  The function arguments have been abstracted away.  This is analogous to how tacit verbs abstract away noun arguments (e.g.  +/%#  means "the sum divided by the tally"; no arguments are mentioned).

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by Tracy Harms-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Oleg,

Between what John Randall wrote and the elaboration by the author in
the original, I think your attempt at correction is unsuccessful.

Here is what Conal Elliott wrote to clarify the equation I reproduced:

[begin quotation]

The composition on the right hand side is on linear maps
(derivatives). You may be used to seeing the chain rule in one or more
of its specialized forms, using some form of product (scalar/scalar,
scalar/vector, vector/vector dot, matrix/vector) instead of
composition. Those forms all mean the same as this general case, but
are defined on various *representations* of linear maps, instead of
linear maps themselves.

[end quotation]

John's comments have helped me understand how to think of this in
terms of linear maps.  I'm not entirely "there", yet, but I'm a whole
lot closer than I was.


Tracy

On Tue, Sep 16, 2008 at 9:33 AM, Oleg Kobchenko <olegykj@...> wrote:

>
>...
>>
>> http://conal.net/blog/
>>
>> Perhaps most interesting is the one entitled "Higher-dimensional,
>> higher-order derivatives, functionally".  In that posting the chain
>> rule is given by the following expression:
>>
>>    deriv (f . g) x = deriv f (g x) . deriv g x
>>
>
> Incorrect form: the first "." is composition, the second
> should be multiplication. The composition on the right
> side should be between f' and first g.
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by John Randall-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tracy Harms wrote:

> The composition on the right hand side is on linear maps
> (derivatives).

Representations of linear maps come from choosing bases, thereby
giving coordinates.  If we do that, it's reasonably straightforward
except that we are misled by the 1-dimensional case.

Here's a simple 2-dimensional case.

g(x,y) = (x^2,y^2)

f(x,y) = (x^2+y,y)

(f @ g)(x,y)=(x^4+y^2,y^2)

D(f @ g)(x,y)= [4x^3 2y]
               [0    2y]

Df(g(x,y)) Dg(x,y) = [2x^2 1][2x 0 ] = [4x^3 2y]
                     [1    1][0  2y]   [0    2y]

Best wishes,

John


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Looking at Dan Bron's example, I ran into something that
I do not understand:

Here, I interpret the result as representing
the original dimension of the argument
vector in the diagonal of the result.
    0~:1&o. D. 1 (_78 0.9 1p1 1j1)
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1

But I do not understand the result I get here.
    0~:1&o. D. 2 (_78 0.9 1p1 1j1)

I was expecting the result that I get from
    0~:1&o. D. 1 D. 1(_78 0.9 1p1 1j1)

Am I looking at a bug, or a real issue?

Thanks,

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Sep 16, 2008 at 3:09 PM, John Randall
<randall@...> wrote:
> Representations of linear maps come from choosing bases, thereby
> giving coordinates.  If we do that, it's reasonably straightforward
> except that we are misled by the 1-dimensional case.

The wikipedia page on linear maps expresses some qualities of
linear maps in terms of addition and multiplication.

This implies, to me, that in the context of any linear map,
there must be some value that corresponds to 0, and some
other value which corresponds to 1.  (If this is not the case,
then there must be some way of defining addition and
multiplication which does not conform to the peano postulates?)

That wikipedia page also asserts:
   Differentiation is a linear map from the space of all differentiable
   functions to the space of all functions.

I think I can use the constant zero function for 0, but I am having
a problem figuring out what 1 would be.  Does anyone know?

(By constant zero function, I mean J's 0: more or less -- there might
be an issue with J rank which corresponds roughly to the space over
which the differentiable functions are defined?  I suspect that that
wikipedia assertion about differentiation should be constrained to
functions over a given space, since I can not see how to support
linear combinations of functions which were defined over arbitrarily
different spaces.)

Thanks,

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by John Randall-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raul Miller wrote:

> This implies, to me, that in the context of any linear map,
> there must be some value that corresponds to 0, and some
> other value which corresponds to 1.  (If this is not the case,
> then there must be some way of defining addition and
> multiplication which does not conform to the peano postulates?)
>
> That wikipedia page also asserts:
>    Differentiation is a linear map from the space of all differentiable
>    functions to the space of all functions.
>
> I think I can use the constant zero function for 0, but I am having
> a problem figuring out what 1 would be.  Does anyone know?
>

You are absolutely right that the zero element in the vector space V of
linear maps is the zero function.  The element corresponding to 1 is not
in V, but in the field (R) over which V is a vector space.  You have to be
able to add in V, and be able to multiply something in R by something in
V.
 Saying that differentiation is a linear map D:V->V then means things like

D(f+g)=(Df)+(Dg), D(kf)=k D(f).

It says nothing about multiplying functions together.

Best wishes,

John


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by John Randall-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raul Miller wrote:
>
> I was expecting the result that I get from
>     0~:1&o. D. 1 D. 1(_78 0.9 1p1 1j1)
>

VD=: 1 : 'u"1 D.1'
  0~:1&o.VD VD (_78 0.9 1p1 1j1)
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0

0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0

0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0

0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 1

Best wishes,

John


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by Dan Bron :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Oleg wrote:
>  Incorrect form: the first "." is composition, the second
>  should be multiplication. The composition on the right
>  side should be between f' and first g.

Tracy responded:
>  Between what John Randall wrote and the elaboration by the
>  author in the original, I think your attempt at correction
>  is unsuccessful.

I think he got it right.  Here's another page on the "chain rule" (composition rule for derivatives):

  http://www.math.uncc.edu/~bjwichno/fall2004-math1242-006/Review_Calc_I/lec_composition_rule.htm

In particular, check out the graphic at the very top:

  http://www.math.uncc.edu/~bjwichno/fall2004-math1242-006/Images/image12.gif

Transcribing that into ASCII, we get:

   (f o g)'(x) = f'(g(x)) * g'(x)

Here, I've substituted ' for the mathematical "prime" symbol [1],  o  for the mathematical "composition" symbol [2] and  *  for the mathematical "dot product" symbol [3].

Incidentally, I think poor ASCII transliteration was what caused this confusion in the first place.  Conal wanted to use the symbol "." for both function composition and dot product.  Not coincidentally, problems like these are why J symbols are all ASCII-based.  

Nor is the original math notation itself without flaw.  I notice that the left-hand side of the equation is written using function composition:

   (f o g)'

but the right-hand side is written with nested function calls:

   f'(g(x))

which is inconsistent.  I might've written the equation thus:

   (f o g)'(x) = (f' o g)(x) * g'(x)

Having done so, it is easy to get a reasonable approximation of the equation in J, with the additional benefit that it's executable:

           o      =:  @:       NB.  Composition
           D      =:  D.1      NB.  Derivative (i.e. prime-symbol)
           x      =:  +/ . *   NB.  Dot product (even though it looks like a cross product, or the variable "x").

           NB.             (f o g)'(x) = (f'  o g)(x) *  g'   (x)
           chain  =:  2 : '(f o g) D   = (f D o g)    x (g D)'

In J4, we could've gotten even closer:

           f      =:  [.
           g      =:  ].
           NB.        (f o g)'(x) =  (f'   o g)(x) *   g'   (x)
           chain  =:  (f o g D)   = ((f D) o g     x  (g D))

-Dan

[1]  ? aka U+2032 (i.e.  u: 16b2032 ) aka "prime" aka "feet" aka "minutes"
     http://www.fileformat.info/info/unicode/char/2032/index.htm

[2]  ? aka U+2218 (i.e.  u: 16b2218 ) aka "composite function" aka "APL jot"
     http://www.fileformat.info/info/unicode/char/2218/index.htm

[3]  ? aka U+22C5 (i.e.  u: 16b22C5 ) aka "dot operator"
     http://www.fileformat.info/info/unicode/char/22C5/index.htm
     
So if my system can compose Unicode properly, and yours can render it, then the formula could be written:

  (f ? g)?(x) = f?(g(x)) ? g?(x)

and, if a future version of J supports Unicode (rather than just ASCII) names, we could assign these characters appropriately:

   ?      =:  @:
   ?      =:  D.1
   ?      =:  +/ . *

and come that much closer to having "executable math".  But that's a long way off.  As this post proves (probably), Unicode has a way to go before we can transmit it reliably.  But it would make the APL guys happy.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by John Randall-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dan Bron wrote:

> Nor is the original math notation itself without flaw.  I notice that the
> left-hand side of the equation is written using function composition:
>
>    (f o g)'
>
> but the right-hand side is written with nested function calls:
>
>    f'(g(x))
>
> which is inconsistent.  I might've written the equation thus:
>
>    (f o g)'(x) = (f' o g)(x) * g'(x)

The prime notation is not really the best mathematical notation for
transparently expressing the chain rule: Leibniz notation is better:

dy    dy  du
-- =  --  --
dx    du  dx


Best wishes,

John


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by Raul Miller-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Sep 16, 2008 at 3:52 PM, John Randall
<randall@...> wrote:

> Raul Miller wrote:
>>
>> I was expecting the result that I get from
>>     0~:1&o. D. 1 D. 1(_78 0.9 1p1 1j1)
>>
>
> VD=: 1 : 'u"1 D.1'
>  0~:1&o.VD VD (_78 0.9 1p1 1j1)
> 1 0 0 0
> 0 0 0 0
...

Exactly.

    (<"2) 0~:1&o. D. 2 (_78 0.9 1p1 1j1)
+-------+-------+-------+-------+
|1 0 0 0|1 0 0 0|1 0 0 0|1 0 0 0|
|0 1 0 0|0 1 0 0|0 1 0 0|0 1 0 0|
|0 0 1 0|0 0 1 0|0 0 1 0|0 0 1 0|
|0 0 0 1|0 0 0 1|0 0 0 1|0 0 0 1|
+-------+-------+-------+-------+
    (<"2) 0~:1&o. D.1 D.1 (_78 0.9 1p1 1j1)
+-------+-------+-------+-------+
|1 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|
|0 0 0 0|0 1 0 0|0 0 0 0|0 0 0 0|
|0 0 0 0|0 0 0 0|0 0 1 0|0 0 0 0|
|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 1|
+-------+-------+-------+-------+

I am having a problem understanding why
  (1&o.D.1 D.1 -: 1&o.D.2) (_78 0.9 1p1 1j1)
0
   9!:14''
j602/2008-03-03/16:45

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Parent Message unknown Re: The unifying generalization of derivatives

by Oleg Kobchenko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I think good treatment echoing John Randall is under

  http://en.wikipedia.org/wiki/Chain_rule#The_fundamental_chain_rule

> From: Tracy Harms <kaleidic@...>
>
> higher-order derivatives, functionally".  In that posting the chain
> rule is given by the following expression:
>
>    deriv (f . g) x = deriv f (g x) . deriv g x
>

To dispel the confusion, the above should be understood as

  Dx (f o g) = Dgx f o Dx g


> From: John Randall <randall@...>
>
> Dan Bron wrote:
>
> > Nor is the original math notation itself without flaw.  I notice that the
> > left-hand side of the equation is written using function composition:
> >
> >    (f o g)'
> >
> > but the right-hand side is written with nested function calls:
> >
> >    f'(g(x))
> >
> > which is inconsistent.  I might've written the equation thus:
> >
> >    (f o g)'(x) = (f' o g)(x) * g'(x)
>
> The prime notation is not really the best mathematical notation for
> transparently expressing the chain rule: Leibniz notation is better:
>
> dy    dy  du
> -- =  --  --
> dx    du  dx



> From: Dan Bron <j@...>
>
> Oleg wrote:
> >  Incorrect form: the first "." is composition, the second
> >  should be multiplication. The composition on the right
> >  side should be between f' and first g.
>
> Tracy responded:
> >  Between what John Randall wrote and the elaboration by the
> >  author in the original, I think your attempt at correction
> >  is unsuccessful.
>
> I think he got it right.  Here's another page on the "chain rule" (composition
> rule for derivatives):
>
>  
> http://www.math.uncc.edu/~bjwichno/fall2004-math1242-006/Review_Calc_I/lec_composition_rule.htm
>
> In particular, check out the graphic at the very top:
>
>   http://www.math.uncc.edu/~bjwichno/fall2004-math1242-006/Images/image12.gif
>
> Transcribing that into ASCII, we get:
>
>    (f o g)'(x) = f'(g(x)) * g'(x)
>
> Here, I've substituted ' for the mathematical "prime" symbol [1],  o  for the
> mathematical "composition" symbol [2] and  *  for the mathematical "dot product"
> symbol [3].
>
> Incidentally, I think poor ASCII transliteration was what caused this confusion
> in the first place.  Conal wanted to use the symbol "." for both function
> composition and dot product.  Not coincidentally, problems like these are why J
> symbols are all ASCII-based.  
>
> Nor is the original math notation itself without flaw.  I notice that the
> left-hand side of the equation is written using function composition:
>
>    (f o g)'
>
> but the right-hand side is written with nested function calls:
>
>    f'(g(x))
>
> which is inconsistent.  I might've written the equation thus:
>
>    (f o g)'(x) = (f' o g)(x) * g'(x)
>
> Having done so, it is easy to get a reasonable approximation of the equation in
> J, with the additional benefit that it's executable:
>
>        o      =:  @:       NB.  Composition
>        D      =:  D.1      NB.  Derivative (i.e. prime-symbol)
>        x      =:  +/ . *   NB.  Dot product (even though it looks like a cross
> product, or the variable "x").
>
>            NB.             (f o g)'(x) = (f'  o g)(x) *  g'   (x)
>        chain  =:  2 : '(f o g) D   = (f D o g)    x (g D)'
>
> In J4, we could've gotten even closer:
>
>        f      =:  [.
>        g      =:  ].
>        NB.        (f o g)'(x) =  (f'   o g)(x) *   g'   (x)
>        chain  =:  (f o g D)   = ((f D) o g     x  (g D))
>
> -Dan
>
> [1]  ? aka U+2032 (i.e.  u: 16b2032 ) aka "prime" aka "feet" aka "minutes"
>     http://www.fileformat.info/info/unicode/char/2032/index.htm
>
> [2]  ? aka U+2218 (i.e.  u: 16b2218 ) aka "composite function" aka "APL jot"
>     http://www.fileformat.info/info/unicode/char/2218/index.htm
>
> [3]  ? aka U+22C5 (i.e.  u: 16b22C5 ) aka "dot operator"
>     http://www.fileformat.info/info/unicode/char/22C5/index.htm
>    
> So if my system can compose Unicode properly, and yours can render it, then the
> formula could be written:
>
>   (f ? g)?(x) = f?(g(x)) ? g?(x)
>
> and, if a future version of J supports Unicode (rather than just ASCII) names,
> we could assign these characters appropriately:
>
>    ?      =:  @:
>    ?      =:  D.1
>    ?      =:  +/ . *
>
> and come that much closer to having "executable math".  But that's a long way
> off.  As this post proves (probably), Unicode has a way to go before we can
> transmit it reliably.  But it would make the APL guys happy.


     
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by John Randall-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raul Miller wrote:

> I am having a problem understanding why
>   (1&o.D.1 D.1 -: 1&o.D.2) (_78 0.9 1p1 1j1)
> 0

I agree there is a problem.  I do not know if the algorithm is flawed, or
if it is just an inaccurate approximation (as has been noted before).
Here is a very simple example.

   f=:*:"1
   f D. 1 D. 1 (1 2)
2 0
0 0

0 0
0 2
   f D. 2 (1 2)
  1.9984        0
       0 _1.11022

0.488498        0
       0   1.9984

Best wishes,

John


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by Tracy Harms-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Conal Elliott has responded to our discussion at his website:

http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/#comment-9078

He has requested that any follow-up questions to his response be
posted at that site so that he may readily see them and reply.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The unifying generalization of derivatives

by Raul Miller-4 :: Rate this Message: