compound operators

View: New views
7 Messages — Rating Filter:   Alert me  

compound operators

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hello,

I've made first attempt to elaborate on the idea John W. Eaton gave on
this list a while ago: having Octave's parser to recognize expressions
like `expr1' * expr2' as special, to allow more efficient mapping of
operations onto BLAS routines.

The initial commit is in my public repo:
https://tw-math.de/highegg

on my laptop (x86/Linux/gcc-4.2.1) this compiles and "make check"'s
without a problem. Unfortunately, our server with Intel C++ is down,
so I can't test anywhere else.

In this initial commit, the following expressions are recognized as special:
A.'*B (trans_mul)
A*B.' (mul_trans)
A'*B (herm_mul)
A*B' (mul_herm)
these are contracted into compound operators provided by octave_value,
which are mapped directly to xGEMM/xGEMV/xDOTx BLAS calls if possible
(double-double or complex-complex), otherwise "decomposed" to the
original evaluation sequence.

for certain matrix sizes, significant speedups can be achieved, as can
be demonstrated with this simple benchmark:
n = 50; m = 505000; a = rand(m,n); b = rand(m,n); tic; c = a'*b; toc ;
clear a b c
which can be done directly via DGEMM('T','N',...) instead of
transposing, then doing DGEMM('N','N',...)

on my laptop (Octave linked with GotoBLAS (no mthreading)), with
current Octave I get
Elapsed time is 4.19386 seconds.
and with the new changeset employed
Elapsed time is 1.79197 seconds.

which is a 234% procent speedup (of course, it took me few tries time
to find dims giving such impressive results :)
obviously, one also saves memory by saving the transpose.

I want to also map A.'*A, A*A' etc to xSYRK and xHERK - this is going
to be simple, and possibly more.

regards,

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Re: compound operators

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Another changeset uploaded; the symmetric cases A'*A, A*A', A.'*A and
A*A.' are now mapped to xSYRK and ZHERK. With these, the benchmark
becomes more impressive:

n = 50; m = 505000; a = rand(m,n); tic; c = a'*a; toc; clear

with current Octave:

Elapsed time is 4.24332 seconds.

with the new changeset:

Elapsed time is 0.916971 seconds.

i.e. a 462% speed-up (this is, of course, caused by the fact that
DSYRK not only avoids transposing and operates more cache coherently,
just like DGEMM('T','N',...), but also calculates only half of the
matrix)

what next?

My top candidates are expressions diag (v)*A, A*diag (v), diag (v) \
A, A / diag (v).
It would be possible to choose the same approach, and implement
mul_diag, diag_mul, div_diag, diag_ldiv.
But it seems that there are already special objects for diagonal
matrices and the special operations DiagMatrix * Matrix are already
there, it's just that octave_value is not (I think) specialized to
hold a DiagMatrix object. It seems that would be a better approach.

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Re: compound operators

by Fredrik Lingvall :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jaroslav Hajek wrote:

> Another changeset uploaded; the symmetric cases A'*A, A*A', A.'*A and
> A*A.' are now mapped to xSYRK and ZHERK. With these, the benchmark
> becomes more impressive:
>
> n = 50; m = 505000; a = rand(m,n); tic; c = a'*a; toc; clear
>
> with current Octave:
>
> Elapsed time is 4.24332 seconds.
>
> with the new changeset:
>
> Elapsed time is 0.916971 seconds.
>
> i.e. a 462% speed-up (this is, of course, caused by the fact that
> DSYRK not only avoids transposing and operates more cache coherently,
> just like DGEMM('T','N',...), but also calculates only half of the
> matrix)
>  

Super! I have been waiting for this one - no more need for my
specialized oct-functions to do this.

/Fredrik

--
Fredrik Lingvall, PhD                                                              
E-mail: fl@..., fredrik.lingvall@...
Web:    http://folk.uio.no/fl/


compound operators

by John W. Eaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On  7-May-2008, Jaroslav Hajek wrote:

| I've made first attempt to elaborate on the idea John W. Eaton gave on
| this list a while ago: having Octave's parser to recognize expressions
| like `expr1' * expr2' as special, to allow more efficient mapping of
| operations onto BLAS routines.
|
| The initial commit is in my public repo:
| https://tw-math.de/highegg

OK, this looks like an excellent start.

In functions like

  static octave_value::unary_op
  strip_trans_herm (tree_expression *&exp)
  {
    if (exp->is_unary_expression ())
      {
        tree_unary_expression *uexp =
          dynamic_cast<tree_unary_expression *> (exp);
        octave_value::unary_op op = uexp->op_type ();
        if (op == octave_value::op_transpose
            || op == octave_value::op_hermitian)
          {
            exp = uexp->operand ();
          }
        else
          {
            op = octave_value::unknown_unary_op;
          }
        return op;
      }
    else
      return octave_value::unknown_unary_op;
  }

it looks like there will be a memory leak if you replace exp (which
was dynamically allocated by the parser) with another part of the
parse tree.  So maybe this should be

  static octave_value::unary_op
  strip_trans_herm (tree_expression *&exp)
  {
    octave_value::unary_op retval = octave_value::unknown_unary_op;

    if (exp->is_unary_expression ())
      {
        tree_unary_expression *uexp =
          dynamic_cast<tree_unary_expression *> (exp);

        octave_value::unary_op op = uexp->op_type ();

        if (op == octave_value::op_transpose
            || op == octave_value::op_hermitian)
          {
            tree_expression *tmp = exp;
            exp = uexp->operand ();
            delete tmp;

            retval = op;
          }
      }

    return retval
  }

?

Also, if you add a new parse tree class, you need to add it to the
tree_walker class in pt-walk.h and then create corresponding visitor
functions in all the classes that can walk the tree.  Currently, those
are in the pt-pr-code, pt-bp, and pt-check files.

jwe

Re: compound operators

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, May 8, 2008 at 5:46 PM, John W. Eaton <jwe@...> wrote:

> On  7-May-2008, Jaroslav Hajek wrote:
>
> | I've made first attempt to elaborate on the idea John W. Eaton gave on
> | this list a while ago: having Octave's parser to recognize expressions
> | like `expr1' * expr2' as special, to allow more efficient mapping of
> | operations onto BLAS routines.
> |
> | The initial commit is in my public repo:
> | https://tw-math.de/highegg
>
> OK, this looks like an excellent start.
>
> In functions like
>
>  static octave_value::unary_op
>  strip_trans_herm (tree_expression *&exp)
>  {
>    if (exp->is_unary_expression ())
>      {
>        tree_unary_expression *uexp =
>          dynamic_cast<tree_unary_expression *> (exp);
>        octave_value::unary_op op = uexp->op_type ();
>        if (op == octave_value::op_transpose
>            || op == octave_value::op_hermitian)
>          {
>            exp = uexp->operand ();
>          }
>        else
>          {
>            op = octave_value::unknown_unary_op;
>          }
>        return op;
>      }
>    else
>      return octave_value::unknown_unary_op;
>  }
>
> it looks like there will be a memory leak if you replace exp (which
> was dynamically allocated by the parser) with another part of the
> parse tree.  So maybe this should be
>
>  static octave_value::unary_op
>  strip_trans_herm (tree_expression *&exp)
>  {
>    octave_value::unary_op retval = octave_value::unknown_unary_op;
>
>    if (exp->is_unary_expression ())
>      {
>        tree_unary_expression *uexp =
>          dynamic_cast<tree_unary_expression *> (exp);
>
>        octave_value::unary_op op = uexp->op_type ();
>
>        if (op == octave_value::op_transpose
>            || op == octave_value::op_hermitian)
>          {
>            tree_expression *tmp = exp;
>            exp = uexp->operand ();
>            delete tmp;
>
>            retval = op;
>          }
>      }
>
>    return retval
>  }
>
> ?
>

I hope not. The tree_compound_binary_expression class has a slightly
different philosophy:
it does not "own" the stripped operands, just stores pointers to them
(note that it has no destructor). The orginal operands is still owned
by the parent tree_binary_expression object.

I thought this would be the best design - the compound operator needs
not be visible to printing code (we want the code to print normally
like `A'*B') or breakpoints, so tree-print-code and tree-breakpoint
need not care. The original expression is retained, because it may
still be useful. For example, we may later want to implement a
simplification that
trace (A*B) evaluates as sum((A.*B)(:)) (this is probably not of much
use, but demonstrates the matter). With my approach, trace (A'*B) will
be properly simplified, because the A'*B expression can still be
identified as a multiplication.

This is also the reason why compound_binary_op is a separate enum. In
octave_value and octave_value_typeinfo, I have used overloading of the
existing xxx_binary_op functions if possible, because that allows to
reuse some of the registration macros (and some of the overloaded
functions are simply duplicated - perhaps templates may later be
employed instead, although it didn't seem worth the trouble).

> Also, if you add a new parse tree class, you need to add it to the
> tree_walker class in pt-walk.h and then create corresponding visitor
> functions in all the classes that can walk the tree.  Currently, those
> are in the pt-pr-code, pt-bp, and pt-check files.
>
> jwe
>

Explained above - I didn't consider it necessary, as no other code
needs to know about the simplified expression other than the "virtual
constructor" - maybe_compound_binary_expression in pt-cbinop.h.

Do you see any potential problems with this approach?

regards,

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Re: compound operators

by John W. Eaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On  8-May-2008, Jaroslav Hajek wrote:

| I hope not. The tree_compound_binary_expression class has a slightly
| different philosophy:
| it does not "own" the stripped operands, just stores pointers to them
| (note that it has no destructor). The orginal operands is still owned
| by the parent tree_binary_expression object.

OK, I see now.

| I thought this would be the best design - the compound operator needs
| not be visible to printing code (we want the code to print normally
| like `A'*B') or breakpoints, so tree-print-code and tree-breakpoint
| need not care. The original expression is retained, because it may
| still be useful. For example, we may later want to implement a
| simplification that
| trace (A*B) evaluates as sum((A.*B)(:)) (this is probably not of much
| use, but demonstrates the matter). With my approach, trace (A'*B) will
| be properly simplified, because the A'*B expression can still be
| identified as a multiplication.

Will this always work as we want, or could there be problems if we
perform additional transformations on the parse tree?

jwe

Re: compound operators

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, May 13, 2008 at 5:59 AM, John W. Eaton <jwe@...> wrote:

> On  8-May-2008, Jaroslav Hajek wrote:
>
>
> | I hope not. The tree_compound_binary_expression class has a slightly
>  | different philosophy:
>  | it does not "own" the stripped operands, just stores pointers to them
>  | (note that it has no destructor). The orginal operands is still owned
>  | by the parent tree_binary_expression object.
>
>  OK, I see now.
>
>
>  | I thought this would be the best design - the compound operator needs
>  | not be visible to printing code (we want the code to print normally
>  | like `A'*B') or breakpoints, so tree-print-code and tree-breakpoint
>  | need not care. The original expression is retained, because it may
>  | still be useful. For example, we may later want to implement a
>  | simplification that
>  | trace (A*B) evaluates as sum((A.*B)(:)) (this is probably not of much
>  | use, but demonstrates the matter). With my approach, trace (A'*B) will
>  | be properly simplified, because the A'*B expression can still be
>  | identified as a multiplication.
>
>  Will this always work as we want, or could there be problems if we
>  perform additional transformations on the parse tree?
>
>  jwe
>

Currently, the answer is yes, because once a node is constructed, its
operands (subtrees) seem to be immutable. That's why I favored
"evaluation shortcut" to modifying the true expression tree.
If we want some transformations in the future, it should be OK if
they're invoked via virtual function(s) - we simply override to retry
the simplification afterwards, or just invalidate it.

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
LightInTheBox - Buy quality products at wholesale price