|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
compound operatorshello,
I've made first attempt to elaborate on the idea John W. Eaton gave on this list a while ago: having Octave's parser to recognize expressions like `expr1' * expr2' as special, to allow more efficient mapping of operations onto BLAS routines. The initial commit is in my public repo: https://tw-math.de/highegg on my laptop (x86/Linux/gcc-4.2.1) this compiles and "make check"'s without a problem. Unfortunately, our server with Intel C++ is down, so I can't test anywhere else. In this initial commit, the following expressions are recognized as special: A.'*B (trans_mul) A*B.' (mul_trans) A'*B (herm_mul) A*B' (mul_herm) these are contracted into compound operators provided by octave_value, which are mapped directly to xGEMM/xGEMV/xDOTx BLAS calls if possible (double-double or complex-complex), otherwise "decomposed" to the original evaluation sequence. for certain matrix sizes, significant speedups can be achieved, as can be demonstrated with this simple benchmark: n = 50; m = 505000; a = rand(m,n); b = rand(m,n); tic; c = a'*b; toc ; clear a b c which can be done directly via DGEMM('T','N',...) instead of transposing, then doing DGEMM('N','N',...) on my laptop (Octave linked with GotoBLAS (no mthreading)), with current Octave I get Elapsed time is 4.19386 seconds. and with the new changeset employed Elapsed time is 1.79197 seconds. which is a 234% procent speedup (of course, it took me few tries time to find dims giving such impressive results :) obviously, one also saves memory by saving the transpose. I want to also map A.'*A, A*A' etc to xSYRK and xHERK - this is going to be simple, and possibly more. regards, -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
|
Re: compound operatorsAnother changeset uploaded; the symmetric cases A'*A, A*A', A.'*A and
A*A.' are now mapped to xSYRK and ZHERK. With these, the benchmark becomes more impressive: n = 50; m = 505000; a = rand(m,n); tic; c = a'*a; toc; clear with current Octave: Elapsed time is 4.24332 seconds. with the new changeset: Elapsed time is 0.916971 seconds. i.e. a 462% speed-up (this is, of course, caused by the fact that DSYRK not only avoids transposing and operates more cache coherently, just like DGEMM('T','N',...), but also calculates only half of the matrix) what next? My top candidates are expressions diag (v)*A, A*diag (v), diag (v) \ A, A / diag (v). It would be possible to choose the same approach, and implement mul_diag, diag_mul, div_diag, diag_ldiv. But it seems that there are already special objects for diagonal matrices and the special operations DiagMatrix * Matrix are already there, it's just that octave_value is not (I think) specialized to hold a DiagMatrix object. It seems that would be a better approach. -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
|
Re: compound operatorsJaroslav Hajek wrote:
> Another changeset uploaded; the symmetric cases A'*A, A*A', A.'*A and > A*A.' are now mapped to xSYRK and ZHERK. With these, the benchmark > becomes more impressive: > > n = 50; m = 505000; a = rand(m,n); tic; c = a'*a; toc; clear > > with current Octave: > > Elapsed time is 4.24332 seconds. > > with the new changeset: > > Elapsed time is 0.916971 seconds. > > i.e. a 462% speed-up (this is, of course, caused by the fact that > DSYRK not only avoids transposing and operates more cache coherently, > just like DGEMM('T','N',...), but also calculates only half of the > matrix) > Super! I have been waiting for this one - no more need for my specialized oct-functions to do this. /Fredrik -- Fredrik Lingvall, PhD E-mail: fl@..., fredrik.lingvall@... Web: http://folk.uio.no/fl/ |
|
|
compound operatorsOn 7-May-2008, Jaroslav Hajek wrote:
| I've made first attempt to elaborate on the idea John W. Eaton gave on | this list a while ago: having Octave's parser to recognize expressions | like `expr1' * expr2' as special, to allow more efficient mapping of | operations onto BLAS routines. | | The initial commit is in my public repo: | https://tw-math.de/highegg OK, this looks like an excellent start. In functions like static octave_value::unary_op strip_trans_herm (tree_expression *&exp) { if (exp->is_unary_expression ()) { tree_unary_expression *uexp = dynamic_cast<tree_unary_expression *> (exp); octave_value::unary_op op = uexp->op_type (); if (op == octave_value::op_transpose || op == octave_value::op_hermitian) { exp = uexp->operand (); } else { op = octave_value::unknown_unary_op; } return op; } else return octave_value::unknown_unary_op; } it looks like there will be a memory leak if you replace exp (which was dynamically allocated by the parser) with another part of the parse tree. So maybe this should be static octave_value::unary_op strip_trans_herm (tree_expression *&exp) { octave_value::unary_op retval = octave_value::unknown_unary_op; if (exp->is_unary_expression ()) { tree_unary_expression *uexp = dynamic_cast<tree_unary_expression *> (exp); octave_value::unary_op op = uexp->op_type (); if (op == octave_value::op_transpose || op == octave_value::op_hermitian) { tree_expression *tmp = exp; exp = uexp->operand (); delete tmp; retval = op; } } return retval } ? Also, if you add a new parse tree class, you need to add it to the tree_walker class in pt-walk.h and then create corresponding visitor functions in all the classes that can walk the tree. Currently, those are in the pt-pr-code, pt-bp, and pt-check files. jwe |
|
|
Re: compound operatorsOn Thu, May 8, 2008 at 5:46 PM, John W. Eaton <jwe@...> wrote:
> On 7-May-2008, Jaroslav Hajek wrote: > > | I've made first attempt to elaborate on the idea John W. Eaton gave on > | this list a while ago: having Octave's parser to recognize expressions > | like `expr1' * expr2' as special, to allow more efficient mapping of > | operations onto BLAS routines. > | > | The initial commit is in my public repo: > | https://tw-math.de/highegg > > OK, this looks like an excellent start. > > In functions like > > static octave_value::unary_op > strip_trans_herm (tree_expression *&exp) > { > if (exp->is_unary_expression ()) > { > tree_unary_expression *uexp = > dynamic_cast<tree_unary_expression *> (exp); > octave_value::unary_op op = uexp->op_type (); > if (op == octave_value::op_transpose > || op == octave_value::op_hermitian) > { > exp = uexp->operand (); > } > else > { > op = octave_value::unknown_unary_op; > } > return op; > } > else > return octave_value::unknown_unary_op; > } > > it looks like there will be a memory leak if you replace exp (which > was dynamically allocated by the parser) with another part of the > parse tree. So maybe this should be > > static octave_value::unary_op > strip_trans_herm (tree_expression *&exp) > { > octave_value::unary_op retval = octave_value::unknown_unary_op; > > if (exp->is_unary_expression ()) > { > tree_unary_expression *uexp = > dynamic_cast<tree_unary_expression *> (exp); > > octave_value::unary_op op = uexp->op_type (); > > if (op == octave_value::op_transpose > || op == octave_value::op_hermitian) > { > tree_expression *tmp = exp; > exp = uexp->operand (); > delete tmp; > > retval = op; > } > } > > return retval > } > > ? > I hope not. The tree_compound_binary_expression class has a slightly different philosophy: it does not "own" the stripped operands, just stores pointers to them (note that it has no destructor). The orginal operands is still owned by the parent tree_binary_expression object. I thought this would be the best design - the compound operator needs not be visible to printing code (we want the code to print normally like `A'*B') or breakpoints, so tree-print-code and tree-breakpoint need not care. The original expression is retained, because it may still be useful. For example, we may later want to implement a simplification that trace (A*B) evaluates as sum((A.*B)(:)) (this is probably not of much use, but demonstrates the matter). With my approach, trace (A'*B) will be properly simplified, because the A'*B expression can still be identified as a multiplication. This is also the reason why compound_binary_op is a separate enum. In octave_value and octave_value_typeinfo, I have used overloading of the existing xxx_binary_op functions if possible, because that allows to reuse some of the registration macros (and some of the overloaded functions are simply duplicated - perhaps templates may later be employed instead, although it didn't seem worth the trouble). > Also, if you add a new parse tree class, you need to add it to the > tree_walker class in pt-walk.h and then create corresponding visitor > functions in all the classes that can walk the tree. Currently, those > are in the pt-pr-code, pt-bp, and pt-check files. > > jwe > Explained above - I didn't consider it necessary, as no other code needs to know about the simplified expression other than the "virtual constructor" - maybe_compound_binary_expression in pt-cbinop.h. Do you see any potential problems with this approach? regards, -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
|
|
Re: compound operatorsOn 8-May-2008, Jaroslav Hajek wrote:
| I hope not. The tree_compound_binary_expression class has a slightly | different philosophy: | it does not "own" the stripped operands, just stores pointers to them | (note that it has no destructor). The orginal operands is still owned | by the parent tree_binary_expression object. OK, I see now. | I thought this would be the best design - the compound operator needs | not be visible to printing code (we want the code to print normally | like `A'*B') or breakpoints, so tree-print-code and tree-breakpoint | need not care. The original expression is retained, because it may | still be useful. For example, we may later want to implement a | simplification that | trace (A*B) evaluates as sum((A.*B)(:)) (this is probably not of much | use, but demonstrates the matter). With my approach, trace (A'*B) will | be properly simplified, because the A'*B expression can still be | identified as a multiplication. Will this always work as we want, or could there be problems if we perform additional transformations on the parse tree? jwe |
|
|
Re: compound operatorsOn Tue, May 13, 2008 at 5:59 AM, John W. Eaton <jwe@...> wrote:
> On 8-May-2008, Jaroslav Hajek wrote: > > > | I hope not. The tree_compound_binary_expression class has a slightly > | different philosophy: > | it does not "own" the stripped operands, just stores pointers to them > | (note that it has no destructor). The orginal operands is still owned > | by the parent tree_binary_expression object. > > OK, I see now. > > > | I thought this would be the best design - the compound operator needs > | not be visible to printing code (we want the code to print normally > | like `A'*B') or breakpoints, so tree-print-code and tree-breakpoint > | need not care. The original expression is retained, because it may > | still be useful. For example, we may later want to implement a > | simplification that > | trace (A*B) evaluates as sum((A.*B)(:)) (this is probably not of much > | use, but demonstrates the matter). With my approach, trace (A'*B) will > | be properly simplified, because the A'*B expression can still be > | identified as a multiplication. > > Will this always work as we want, or could there be problems if we > perform additional transformations on the parse tree? > > jwe > Currently, the answer is yes, because once a node is constructed, its operands (subtrees) seem to be immutable. That's why I favored "evaluation shortcut" to modifying the true expression tree. If we want some transformations in the future, it should be OK if they're invoked via virtual function(s) - we simply override to retry the simplification afterwards, or just invalidate it. -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz |
| Free Forum Powered by Nabble | Forum Help |