GiNaC as the symbolic manipulation engine in Sage

View: New views
16 Messages — Rating Filter:   Alert me  

GiNaC as the symbolic manipulation engine in Sage

by Burcin Erocal :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

This e-mail is going to two lists (sage-devel and ginac-list), so there
will be some redundant information. I hope you still take the time to
read it to the end, where you can see some cool benchmarks. :)


William often states that Sage is now capable of fully supporting his
research. Unfortunately, for my research in symbolic summation &
integration Sage lacks some of the very basic facilities.

If Sage is to become a "viable alternative to Maple and Mathematica",
it needs to be able to support research in symbolic computation, and
development of new algorithms. At a bare minimum this means providing
support for fast symbolic manipulation and arithmetic as well as some
form of pattern matching.

Most people reading this e-mail are probably aware of the shortcomings
of the symbolics code in Sage at the moment, but here are some timings
to make the situation clear.

MMA:

In[1]:= a = (x+y)^2;
In[2]:= Timing[Sum[a^i,{i,1,10000}];]

Out[2]= {0.028001, Null}

Maple:

> a:=(x+y)^2:
> st:=time(): t:=add(a^i,i=1..10000): time()-st;
0.024


Sage via the maxima interface didn't finish this task in < 5 minutes.

Timings for some of the alternatives that were considered as a backend
for symbolics:

Sympy (version 0.6.0 in Sage-3.0.6) didn't complete the above operation
in < 5 minutes.

sympycore (SVN revision 1047):

sage: from sympycore import Symbol
sage: x, y = Symbol("x"), Symbol("y")
sage: a = (x+y)^2
sage: %time t = sum( [a^i for i in xrange(10000)] )
CPU times: user 3.45 s, sys: 0.78 s, total: 4.23 s
Wall time: 4.23 s


Note that even with its highly optimized structures, sympycore is far
from reaching Maple/MMA performance from pure Python.


At the ACA'08 conference, I've met some physicists using GiNaC
(http://www.ginac.de/), a C++ library that provides symbolic
manipulation and pattern matching facilities. Its lastest release was
on Apr 04 2008, it also has extensive documentation, and active mailing
lists. I wrote a basic Cython wrapper to benchmark its performance.
Here are the numbers:

sage: var("x y z",ns=1)
(x, y, z)
sage: a = (x+y)^2
sage: %time t = sum( [a^i for i in xrange(10000)] )
CPU times: user 1.83 s, sys: 0.00 s, total: 1.83 s
Wall time: 1.83 s


And some more:

sage: b = (y+z)^2
sage: %time t = sum( [a^i*b^j for i in xrange(1,200,5) for j in xrange
(1, 300,  3)] ) CPU times: user 0.35 s, sys: 0.00 s, total: 0.35 s
Wall time: 0.36 s


Maple:

> a := (x+y)^2:
> b := (y+z)^2:
> st := time(): S := 0: for i from 1 to \                                      
> 200 by 5 do for j from 1 to 300 by 3 d\                                      
> o S := S+a^i*b^j end do end do: time() - st;
                                     3.464

MMA:

In[1]:= a=(x+y)^2;
In[2]:= b=(y+z)^2;
In[3]:= S = 0;  
In[4]:= Timing[ Do[ Do[ S=S+a^i*b^j, {j, 1, 300, 3}], {i, 1, 200, 5}]; ]
Out[4]= {11.0527, Null}

Thinking that these timings might be off because I don't know the
proper way to do this in Maple or MMA, here is a different python
timing:

sage: s = a.parent(0)
sage: st = cputime()
sage: for i in xrange(1,200,5):
....:     for j in xrange(1,300,3):
....:         s += a^i*b^j
....:        
sage: et = cputime()
sage: et-st
0.42002600000000001


Like many other subsystems in Sage, the symbolic arithmetic /
manipulation backend should consist of a c/c++ library with a wrapper
which allows the user to implement more advanced algorithms such as
substitutions/tree traversals in Sage/Cython. One can of course
implement  the arithmetic in pure C from Cython, but this goes very
much against the Sage philosophy of not reinventing the wheel.

I suggest GiNaC should be the library we use in Sage for this backend.
Its performance is very impressive and it can get us to a competitive
position in the short term. Once we have a stable symbolics interface,
we can work on more advanced algorithms such as limits / integration
and improving the backend at the same time.

For those who want to try my wrapper, you need to install these two
spkgs (in order):

http://sage.math.washington.edu/home/burcin/spkg/cln-1.2.2.spkg
http://sage.math.washington.edu/home/burcin/spkg/ginac-1.4.3.spkg

And apply this patch:

http://www.risc.jku.at/people/berocal/sage/ginac-wrapper.patch

 
While GiNaC seems to be the best option for Sage, it has some problems
(apart from the awkward capitalization of the name which is a pain to
type).

- It takes ages to build
        The packages above took ~25 minutes to build on my desktop
machine (15 for cln, 9 for ginac)

- cln seems to support on only gcc, which might be a problem for the
windows port (which will be using ms compilers)

- cln duplicates functionality which is provided by different libraries
already distributed with Sage (mpfr, mpir, flint) at a considerable
speed penalty.

- Creating symbolic functions at runtime is not supported.
        I can probably work around this in the wrapper. I've done
something similar in my SCrypt package which provides symbolic
computation capabilities over characteristic 2 for experimenting in
crypto. It relies on PolyBoRi for arithmetic, which is just a
polynomial library, i.e. knows nothing about symbolics. It would still
be better to have support for runtime creation of symbolic functions in
GiNaC.

- IIRC, GiNaC documentation suggests the printing order for the
variables is stable between different sessions, regardless of creation
order. However, running the doctests in the experimental wrapper
reveals different results. We may need to write a custom pretty printer.

- The gethash() function of GiNaC objects doesn't return stable
results, which is probably the cause of the problem above. We need a
stablehash() function.


I think all these problems can be resolved with some effort, and even
the speed of basic arithmetic in GiNaC can be improved with modifying
it's data structures. Admittedly some of these tasks are much easier
than others, but supporting an already stable and mature library like
GiNaC is a much better approach than trying to write our own.

Thoughts? Comments?


Cheers,

Burcin
 
_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

Re: GiNaC as the symbolic manipulation engine in Sage

by Jens Vollinga :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi from GiNaC side,

just some short comments:

Burcin Erocal schrieb:
> - It takes ages to build
> The packages above took ~25 minutes to build on my desktop
> machine (15 for cln, 9 for ginac)

Did you configure with the --disable-static option? Otherwise your build
time is just twice the usual time (looks like it).

> - cln seems to support on only gcc, which might be a problem for the
> windows port (which will be using ms compilers)

<blame others>
That is a real problem, but it can only be solved by the cln people.
</blame others>

> - cln duplicates functionality which is provided by different libraries
> already distributed with Sage (mpfr, mpir, flint) at a considerable
> speed penalty.

It would be quite useful for ginac if the numeric class that wraps the
cln library would allow for other libraries as well, even pure C double
precision ones. The problem in realizing this idea was always that other
libraries were missing certain features (some functions, modular
arithmetic,...) and to make up for these would incur a lot of extra
work. It didn't seem like it was worth the hassle (yet).

> - IIRC, GiNaC documentation suggests the printing order for the
> variables is stable between different sessions, regardless of creation
> order. However, running the doctests in the experimental wrapper
> reveals different results. We may need to write a custom pretty printer.

True.

> - The gethash() function of GiNaC objects doesn't return stable
> results, which is probably the cause of the problem above. We need a
> stablehash() function.

Yes, the two points above are related and I am to a certain point
responsible for that mess. To explain: The original tinfo system in
ginac was changed a few years ago to avoid fixed magic numbers which are
a problem if you have a lot of user supplied ginac classes (made more
urgent by the plan to make functions into full ginac classes). The
current system assigns these dynamically during compilation. But it is
not done in a smart way, so the tinfo numbers usually differ between
different compilations. Now, these tinfo numbers also go into the hash
calculation and the term ordering ...

Personally, I would like to have this issue solved and have ginac to
produce a fixed term ordering (as long as the declaration order of the
symbols done by the user is the same). But at the moment I don't have
the time to do this. Any volunteers? :-)

Regards,
Jens
_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

Parent Message unknown Re: [sage-devel] Re: GiNaC as the symbolic manipulation engine in Sage

by Burcin Erocal :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> On Mon, Aug 11, 2008 at 1:30 AM, Burcin Erocal <burcin@...> wrote:
<snip>

> > While GiNaC seems to be the best option for Sage, it has some problems
> > (apart from the awkward capitalization of the name which is a pain to
> > type).
> >
> > - It takes ages to build
> >        The packages above took ~25 minutes to build on my desktop
> > machine (15 for cln, 9 for ginac)
> >
> > - cln seems to support on only gcc, which might be a problem for the
> > windows port (which will be using ms compilers)
> >
> > - cln duplicates functionality which is provided by different libraries
> > already distributed with Sage (mpfr, mpir, flint) at a considerable
> > speed penalty.

On Mon, 11 Aug 2008 11:39:42 +0200
Jens Vollinga <jensv@...> wrote:

> It would be quite useful for ginac if the numeric class that wraps the
> cln library would allow for other libraries as well, even pure C double
> precision ones. The problem in realizing this idea was always that other
> libraries were missing certain features (some functions, modular
> arithmetic,...) and to make up for these would incur a lot of extra
> work. It didn't seem like it was worth the hassle (yet).

I also think that using other libraries instead of cln in the
(subclasses of) numeric class is the way to go. With William's remarks
below, this becomes essential for Sage. :)

On Mon, 11 Aug 2008 02:37:40 -0700
"William Stein" <wstein@...> wrote:

>
> The single big question that needs to be answered that you don't
> answer in your post is the following:
>
>   Is it likely in your opinion that with a month of hard work by you (say),
>   would it be possible to create something based on Ginac that would
>   actually satisfy the following:
>
>     1. Does not depend on cln.
>     2. Builds in less than 8 minutes.
>
> I mean this as a 100% technical questions (not social, etc.).
> I just had a look at the source code directory for ginac, and it is about
> 20,000 lines of pretty well documented C++ code total.   I have
> the strong impression that the use of cln is nearly completely
> encapsulated in the file numeric.cpp, which is about 2000 lines.
> I'm guessing a Sage version of Ginac could simply replace numeric.cpp
> by our own implementation.


Of course this is possible. I am willing to put in most of the work to
make this happen. However, support from the GiNaC developers for such
an effort is very important. They know their code base best, with their
suggestions this could be accomplished much faster.

>
> cln will not be included standard in Sage ever.
> I could definitely see a version of ginac that doesn't depend on cln
> as possible for Sage...
>
>  -- William>


Cheers,

Burcin
_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

Parent Message unknown Re: [sage-devel] Re: GiNaC as the symbolic manipulation engine in Sage

by Burcin Erocal :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 11 Aug 2008 02:37:40 -0700
"William Stein" <wstein@...> wrote:

>
> On Mon, Aug 11, 2008 at 1:30 AM, Burcin Erocal <burcin@...> wrote:
<snip>

> >
> > At the ACA'08 conference, I've met some physicists using GiNaC
> > (http://www.ginac.de/), a C++ library that provides symbolic
> > manipulation and pattern matching facilities. Its lastest release was
> > on Apr 04 2008, it also has extensive documentation, and active mailing
> > lists. I wrote a basic Cython wrapper to benchmark its performance.
> > Here are the numbers:
> >
> > sage: var("x y z",ns=1)
> > (x, y, z)
> > sage: a = (x+y)^2
> > sage: %time t = sum( [a^i for i in xrange(10000)] )
> > CPU times: user 1.83 s, sys: 0.00 s, total: 1.83 s
> > Wall time: 1.83 s
>
> This benchmark might be lame. Even the current "insanely slow"
> maxima-based symbolics in Sage seem to do better than what you
> just wrote above:
>
> sage: var('x,y')
> (x, y)
> sage: a = (x+y)^2
> sage: time t = sum(a^i for i in xrange(10000))
> CPU times: user 0.51 s, sys: 0.03 s, total: 0.54 s
> Wall time: 0.55 s
>
> and
>
> sage: var('y,z'); b = (y+z)^2
> sage: time t = sum([a^i*b^j for i in xrange(1,200,5) for j in
> xrange(1, 300,  3)] )
> CPU times: user 0.30 s, sys: 0.01 s, total: 0.31 s
> Wall time: 0.32 s
>
> That's because the benchmark is maybe testing nothing
> but memory allocation.

Your benchmark doesn't call maxima at all. When you quit Sage after
these lines, there is no

Exiting spawned Maxima process.

notice printed.


> I have seen a lot of *real* benchmarks involving symbolic calculus
> when sad users and students come to me and ask why their code
> is 1000 times slower than Mathematica.  This happened a *few* times
> today at Sage Days 9 today.  It also comes up in sage-support all
> the time.  I really need to come up with a large list of specific examples
> like this, since I'm getting way too confused by benchmarks.

I couldn't find any standard benchmarks, and had to come up with the
one above. I think it is reflects some aspect of performance, as the
power operator is very similar to a symbolic function application, and
the system has to do a basic simplification like

((x+y)^2)^i = (x+y)^(2*i)

for each step.


Cheers,

Burcin
_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

Re: GiNaC as the symbolic manipulation engine in Sage

by Burcin Erocal :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 11 Aug 2008 11:39:42 +0200
Jens Vollinga <jensv@...> wrote:

> Hi from GiNaC side,
>
> just some short comments:
>
> Burcin Erocal schrieb:
> > - It takes ages to build
> > The packages above took ~25 minutes to build on my desktop
> > machine (15 for cln, 9 for ginac)
>
> Did you configure with the --disable-static option? Otherwise your build
> time is just twice the usual time (looks like it).

I didn't do anything to try to make these times better. I was more
concerned with benchmarking speed of basic manipulations, and thought
that these problems can be solved easily later.

Your suggestion reduces the compilation time of ginac to 5 minutes.
Looks like we're already below William's 8 minute limit. Now can we fix
the cln problem? :)


Thanks.

Burcin


> > - cln seems to support on only gcc, which might be a problem for the
> > windows port (which will be using ms compilers)
>
> <blame others>
> That is a real problem, but it can only be solved by the cln people.
> </blame others>
>
> > - cln duplicates functionality which is provided by different libraries
> > already distributed with Sage (mpfr, mpir, flint) at a considerable
> > speed penalty.
>
> It would be quite useful for ginac if the numeric class that wraps the
> cln library would allow for other libraries as well, even pure C double
> precision ones. The problem in realizing this idea was always that other
> libraries were missing certain features (some functions, modular
> arithmetic,...) and to make up for these would incur a lot of extra
> work. It didn't seem like it was worth the hassle (yet).
>
> > - IIRC, GiNaC documentation suggests the printing order for the
> > variables is stable between different sessions, regardless of creation
> > order. However, running the doctests in the experimental wrapper
> > reveals different results. We may need to write a custom pretty printer.
>
> True.
>
> > - The gethash() function of GiNaC objects doesn't return stable
> > results, which is probably the cause of the problem above. We need a
> > stablehash() function.
>
> Yes, the two points above are related and I am to a certain point
> responsible for that mess. To explain: The original tinfo system in
> ginac was changed a few years ago to avoid fixed magic numbers which are
> a problem if you have a lot of user supplied ginac classes (made more
> urgent by the plan to make functions into full ginac classes). The
> current system assigns these dynamically during compilation. But it is
> not done in a smart way, so the tinfo numbers usually differ between
> different compilations. Now, these tinfo numbers also go into the hash
> calculation and the term ordering ...
>
> Personally, I would like to have this issue solved and have ginac to
> produce a fixed term ordering (as long as the declaration order of the
> symbols done by the user is the same). But at the moment I don't have
> the time to do this. Any volunteers? :-)
>
> Regards,
> Jens
> _______________________________________________
> GiNaC-list mailing list
> GiNaC-list@...
> https://www.cebix.net/mailman/listinfo/ginac-list
>
_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

A bad idea: GiNaC as the symbolic manipulation engine in Sage

by Alexei Sheplyakov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello!

On Mon, Aug 11, 2008 at 10:30:59AM +0200, Burcin Erocal wrote:

> While GiNaC seems to be the best option for Sage,

GiNaC is a special purpose symbolic manipulation library. The purpose of
GiNaC is to do "simple" manipulations with "large" expressions using a common
programming language (as opposed to ill designed, unstable ones as offered
by most of CASes). So, I don't think GiNaC is a good choice for Sage
(which attempts to be a general purpose _interactive_ CAS).

> I think all these problems can be resolved with some effort

Don't take me wrong, but I think it's better to put this effort elsewhere.
I.e. most of "issues" you've mentioned are design decisions.

> and even the speed of basic arithmetic in GiNaC can be improved with
> modifying it's data structures.

No matter what data representation you choose, there are always some expensive
operations. Sure, one can optimize data structures so that some particular
operation is fast (at the expence of slowing down other operations). So, to
write a good code one need to be aware of the complexity of operations and use
expensive operations consciously.

To be more specific, here is an example. Let's construct the expression
A_N = \sum_{i = 0}{N} (x+y)^i (with N = 10000). A naive way:

$ cat testadd_dumb.cpp

#include <ginac/ginac.h>
#include <cstdlib> // for atoi
using namespace GiNaC;
using namespace std;

ex do_dumb(const ex& base, const unsigned max_degree)
{
        ex e = 0;
        for (unsigned i = 0; i <= max_degree; i++)
                e += pow(base, i);
        return e;
}

int main(int argc, char** argv)
{
        unsigned max_degree = 10000;
        if (argc > 1)
                max_degree = atoi(argv[1]);

        const symbol x("x"), y("y");
        ex base = x+y;
        ex e = do_dumb(base, max_degree);
        return 0;
}

$ g++ -O2 -march=k8 -finline-limit=2000 testadd_dump.cpp -lginac
$ time ./a.out

real    0m4.916s
user    0m4.644s
sys     0m0.272s

It's very slow, because every += operation calls add:eval() method, which
puts the terms of the sum into a canonical order. A better variant is

$ cat testadd.cpp

#include <ginac/ginac.h>
#include <cstdlib> // for atoi()
using namespace GiNaC;
using namespace std;

// same as in above Maxima code: first construct all terms,
// than make a sum of them.
ex do_smart(const ex& base, const unsigned max_degree)
{
        exvector v;
        v.reserve(max_degree + 1);
        for (unsigned i = 0; i <= max_degree; i++)
                v.push_back(pow(base, i));
        return (new add(v))->setflag(status_flags::dynallocated);
        // setflag incantation tells GiNaC to manage memory for us.
}

int main(int argc, char** argv)
{
        unsigned max_degree = 10000;
        if (argc > 1)
                max_degree = atoi(argv[1]);
        const symbol x("x"), y("y");
        ex base = x+y;
        ex e = do_smart(base, 10000);
        return 0;
}

Here the sum is constructed in a one shot, so terms are sorted only once
instead of 10000 times, so the program works *much* faster:

$ g++ -O2 -march=k8 -finline-limit=2000 testadd.cpp -lginac
$ time ./a.out

real    0m0.078s
user    0m0.064s
sys     0m0.016s

The same applies to Maxima. Constructing the sum in a proper way takes
a reasonable time:

$ cat testadd.mac
args : []; for i from 0 thru 10000 do ( args : cons((x+y)^i, args) )$
expr : apply("+", args)$

$ time ( maxima --batch=testadd.mac >/dev/null )

real    0m0.519s
user    0m0.428s
sys     0m0.092s

N.B. The timing is not quite fair, since it includes the startup time.

> Sage via the maxima interface didn't finish this task in < 5 minutes.

I guess that's because you (or Sage) do it in an extremly inefficient manner.
In other words, if you use Maxima, write the code the Maxima way, not
the Mathematica (Maple, whatever) one, and everybody will be happier. BTW,
the same applies to almost every piece of a software.

> - cln duplicates functionality which is provided by different libraries
> already distributed with Sage (mpfr, mpir, flint)

The libraries duplicating CLN functionality [1] typically have several
serious drawbacks:
 a. The syntax is really awkward (to put it very mildly). For example,
    instead of a simple
                z = x+a*y;
                one need to write several lines of unreadable code.
 b. The user is responsible for memory management. This is absolutely
    inacceptable.

None of these libraries gives any considerable performance gain, so why
bother?
 
[1] As a matter of fact, CLN existed *long* before the libraries you've
mentioned were developed.

> at a considerable speed penalty.

Could you please give any benchmark backing up this claim?

> - Creating symbolic functions at runtime is not supported.

That's a design decision.

> - IIRC, GiNaC documentation suggests the printing order for the
> variables is stable between different sessions, regardless of
> creation order.

Could you please point out which document (and where) says that?

The order of expressions (in particular, symbols) is not stable. This is
a price for a fast comparison of expressions.

Anyway, since GiNaC is non-interactive, the order of terms doesn't actually
matter: typically the output of a program is a small expression (quite often
it is a single number), and the big intermediate expressions are not going
to be examined by human (except for debugging).

> - It takes ages to build
> The packages above took ~25 minutes to build on my desktop machine

Just for the record: building GiNaC and CLN takes ~30 minutes on my old
Duron 800 box with 640 Mb of RAM. I wonder what kind of machine do you use
as a desktop. That said, patches reducing the build time are wellcome (if
they don't reduce the performance and don't make the code more complicated).

> - cln seems to support on only gcc,

1. CLN maintainers accept patches to support other platforms.
2. Cleaning up CLN from non-portable code is in my TODO list (although
   the priority is *very* low).
3. gcc is available for (almost) every architecture, so who cares, anyway?

Best regards,
        Alexei

--
All science is either physics or stamp collecting.



_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

signature.asc (844 bytes) Download Attachment

Re: A bad idea: GiNaC as the symbolic manipulation engine in Sage

by Jens Vollinga :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Alexei Sheplyakov schrieb:
> GiNaC is a special purpose symbolic manipulation library. The purpose of
> GiNaC is to do "simple" manipulations with "large" expressions using a common
> programming language (as opposed to ill designed, unstable ones as offered

well, I don't quite agree but maybe I am misinterpreting the quotation
marks ;-). If you want to handle large expression, then use FORM. And
what a simple manipulation is depends on whom you ask. Maybe for a
mathematician symbolic integration for example is a very simple
manipulation ...

GiNaC's raison d'etre is to help doing physics calculations. And it
might be that even some not so simple symbolic manipulation are needed
for that end.

> by most of CASes). So, I don't think GiNaC is a good choice for Sage
> (which attempts to be a general purpose _interactive_ CAS).

They would use ginac via a python wrapper, so I don't see any problems here.

> The order of expressions (in particular, symbols) is not stable. This is
> a price for a fast comparison of expressions.

Additional nitpicking: that is not quite true. There is a difference
between a stable order and a human-friendly order. I mentioned the
reasons for non-stability in my last email. It has nothing to do with
speed. Speed of comparison has only an effect on the human-friendliness.
In a GUI system with mostly short expressions an additional reordering
for the output is feasible and could be done outside ginac by the python
code.

Regards,
Jens
_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

Re: A bad idea: GiNaC as the symbolic manipulation engine in Sage

by Burcin Erocal :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 11 Aug 2008 18:27:54 +0400
Alexei Sheplyakov <varg@...> wrote:

> Hello!
>
> On Mon, Aug 11, 2008 at 10:30:59AM +0200, Burcin Erocal wrote:
>
> > While GiNaC seems to be the best option for Sage,
>
> GiNaC is a special purpose symbolic manipulation library. The purpose of
> GiNaC is to do "simple" manipulations with "large" expressions using a common
> programming language (as opposed to ill designed, unstable ones as offered
> by most of CASes). So, I don't think GiNaC is a good choice for Sage
> (which attempts to be a general purpose _interactive_ CAS).

I suggested using it in Sage for the symbolic manipulation engine only.
I n your first two sentences, you seem to point out that GiNaC is
good for this purpose.

Sage relies on many other libraries to be "general purpose." It uses
libSingular for multivariate polynomials, flint for univariate
polynomials over integers, and so on. Are you denying that GiNaC is the
best "symbol manipulation engine" I can find?

Your statements also seem to suggest that there is something wrong with
interactive use. GiNaC also provides support for this via ginsh,
pyginac, etc.

> > I think all these problems can be resolved with some effort
>
> Don't take me wrong, but I think it's better to put this effort elsewhere.
> I.e. most of "issues" you've mentioned are design decisions.

I am disappointed.

> > and even the speed of basic arithmetic in GiNaC can be improved with
> > modifying it's data structures.
>
> No matter what data representation you choose, there are always some expensive
> operations. Sure, one can optimize data structures so that some particular
> operation is fast (at the expence of slowing down other operations). So, to
> write a good code one need to be aware of the complexity of operations and use
> expensive operations consciously.

OK, so I chose a bad example for a benchmark, and this was pointed out
before.

When I said modifying data structures, I meant that you can keep the
arguments of an operator in a different structure, instead of a list.
This would probably only make sense on very large expressions, and at
this point, I don't know what the penalty for smaller expressions would
be.

As for different ways of storing the arguments of an operator, we can
look at methods for storing and calculating with polynomials. One
method is to use a hash table to keep the monomials ,Magma, a
commercial system that has the fastest polynomial arithmetic, uses this
method. Another is to use GeoBuckets which is explained here:

 http://portal.acm.org/citation.cfm?id=275394

Singular, which happens to be the fastest free software library for
multivariate polynomial arithmetic uses this. You can also use binary
heaps as suggested here:

http://portal.acm.org/citation.cfm?id=1086847

Michael Monagan and Roman Pierce use this method in the new blazingly
fast multivariate polynomial libray they wrote for maple:

http://www.cecm.sfu.ca/~mmonagan/papers/sdmp19.pdf

<snip>

>
> > - cln duplicates functionality which is provided by different libraries
> > already distributed with Sage (mpfr, mpir, flint)
>
> The libraries duplicating CLN functionality [1] typically have several
> serious drawbacks:
>  a. The syntax is really awkward (to put it very mildly). For example,
>     instead of a simple
> z = x+a*y;
> one need to write several lines of unreadable code.
>  b. The user is responsible for memory management. This is absolutely
>     inacceptable.
>
> None of these libraries gives any considerable performance gain, so why
> bother?
>  
> [1] As a matter of fact, CLN existed *long* before the libraries you've
> mentioned were developed.
>
> > at a considerable speed penalty.
>
> Could you please give any benchmark backing up this claim?
>
> > - Creating symbolic functions at runtime is not supported.
>
> That's a design decision.
>
> > - IIRC, GiNaC documentation suggests the printing order for the
> > variables is stable between different sessions, regardless of
> > creation order.
>
> Could you please point out which document (and where) says that?

I might be remembering this wrong. I just put it down as an item from
my notes while writing the wrapper. I don't think writing a pretty
printer which sorts the terms would be hard.

> The order of expressions (in particular, symbols) is not stable. This is
> a price for a fast comparison of expressions.
>
> Anyway, since GiNaC is non-interactive, the order of terms doesn't actually
> matter: typically the output of a program is a small expression (quite often
> it is a single number), and the big intermediate expressions are not going
> to be examined by human (except for debugging).
>
> > - It takes ages to build
> > The packages above took ~25 minutes to build on my desktop machine
>
> Just for the record: building GiNaC and CLN takes ~30 minutes on my old
> Duron 800 box with 640 Mb of RAM. I wonder what kind of machine do you use
> as a desktop. That said, patches reducing the build time are wellcome (if
> they don't reduce the performance and don't make the code more complicated).
<snip>

I got the above timing on this machine:

$ cat /proc/cpuinfo  | grep name
model name      : Intel(R) Pentium(R) D CPU 3.40GHz


Cheers,

Burcin
_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

Re: GiNaC as the symbolic manipulation engine in Sage

by Alexei Sheplyakov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

On Mon, Aug 11, 2008 at 11:39:42AM +0200, Jens Vollinga wrote:

> It would be quite useful for ginac if the numeric class that wraps the
> cln library would allow for other libraries as well,

What is the point of that? As I've already mentioned, those libraries
provide very unnatural syntax, and leave the memory management as an
"exercise for the reader".

> even pure C double precision ones.

I think wrapping doubles (and integers) is not a good idea. A better option
is to make them immediate using the same tricks as CLN [1]. This idea is
particulary attractive for x86_64 CPUs. Here we have extra 16 bits for type
tags. We can use these to encode the whole GiNaC class hierarcy, so our custom
RTTI would be *really* fast.

[1] http://www.ginac.de/~kreckel/talks/ICMS2006.pdf

> > - The gethash() function of GiNaC objects doesn't return stable
> > results, which is probably the cause of the problem above. We need a
> > stablehash() function.
>
> Yes, the two points above are related and I am to a certain point responsible
> for that mess.

Not really. Ordering of symbols always was unstable, so was the ordering
of expressions. New tinfo mechanism just made that instability more explicit.

> Now, these tinfo numbers also go into the hash calculation and the term
> ordering ...

I think a custom printing function is a better approach. Any attempt to
"stabilize" the hash function (i.e. comparing symbol names instead of their
serials, comparing class names instead of tinfos, etc.) would substantially
slow it down, which means virtually every operation would be much slower.

Best regards,
        Alexei

--
All science is either physics or stamp collecting.



_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

signature.asc (844 bytes) Download Attachment

Re: [sage-devel] Re: GiNaC as the symbolic manipulation engine in Sage

by Burcin Erocal :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 11 Aug 2008 12:45:36 +0200
Burcin Erocal <burcin@...> wrote:

>
> On Mon, 11 Aug 2008 11:39:42 +0200
> Jens Vollinga <jensv@...> wrote:
>
> > Hi from GiNaC side,
> >
> > just some short comments:
> >
> > Burcin Erocal schrieb:
> > > - It takes ages to build
> > > The packages above took ~25 minutes to build on my desktop
> > > machine (15 for cln, 9 for ginac)
> >
> > Did you configure with the --disable-static option? Otherwise your build
> > time is just twice the usual time (looks like it).
>
> I didn't do anything to try to make these times better. I was more
> concerned with benchmarking speed of basic manipulations, and thought
> that these problems can be solved easily later.
>
> Your suggestion reduces the compilation time of ginac to 5 minutes.
> Looks like we're already below William's 8 minute limit. Now can we fix
> the cln problem? :)

Using the --disable-static option while configuring cln brings down its
build&install time to 9 minutes.

One thing that remains in favor of cln is that it uses memory pools.
This would be hard to do with a simple rewrite. I don't know how much
this effects performance though.

Burcin
_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

Term ordering stability, tinfo keys and all that.

by Alexei Sheplyakov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Aug 11, 2008 at 04:50:54PM +0200, Jens Vollinga wrote:

> well, I don't quite agree but maybe I am misinterpreting the quotation
> marks ;-). If you want to handle large expression, then use FORM.

No way. Its programming language is extremly brain dead. It has a lot
of stupid "features", i.e. it unconditionally expands every expression.
It lacks basic polynomial operations (gcd, collect, to name few). FORM
is impossible to extend. Last, but not least, FORM is a proprietary software.

> what a simple manipulation is depends on whom you ask.

Sure, hence the quotation marks. An example of a "simple" manipulation
is reading an input expression, or adding several expressions and collecting
similar terms. Apparently not every CAS is able to to this, see e.g.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=407109

> > The order of expressions (in particular, symbols) is not stable. This is
> > a price for a fast comparison of expressions.

> Additional nitpicking: that is not quite true.

I don't agree. The purpose of custom GiNaC RTTI ("tinfo keys") is exactly fast
ordering of expressions. A long(er) explanation follows.

The purpose of hash function is to quickly guess if two expressions are NOT
the same in order to minimize the number of (expensive) element-to-element
comparison operations. Notice that two expressions of different type are always
different. Hence the need for an efficient ordering operation for _types_.
IOW, we need a fast RTTI. Standard C++ library provides RTTI, but in ancient
versions of GNU C++ library ordering operation (i.e. typeinfo::before() method)
was very slow, because it was implemented as a comparison of (mangled) type
names. Hence GiNaC invented its own (fast) RTTI with tinfo keys.

As a side note, RTTI implementation in the up to date versions of GNU libstdc++
is much better. It uses the technique similar to GiNaC's tinfo keys, although
it's a bit more complicated (since it needs to handle multiple inheritence
and other obscure stuff). As a matter of fact, using it instead of our custom
RTTI doesn't yeld any measurable overhead. So we can stick with standard C++
RTTI. This won't solve term ordering "issue", but writing new GiNaC classes
would be easier.

> I mentioned the reasons for non-stability in my last email.

The only missing part is why GiNaC needs those "tinfo keys" in first place...

Best regards,
        Alexei

--
All science is either physics or stamp collecting.



_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

signature.asc (844 bytes) Download Attachment

Parent Message unknown Re: [sage-devel] Re: GiNaC as the symbolic manipulation engine in Sage

by Burcin Erocal :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I forwarded your questions to the ginac list. I'll try to answer as
much as I know.

On Mon, 11 Aug 2008 08:24:52 -0700
"William Stein" <wstein@...> wrote:

<snip>
> Burcin,
>
>   1. does Ginac (or the part we plan to use) have *any* known bugs at all?

I can't find a public bug tracker on the web site. They suggest sending
bugs to the mailing list.

>   2. does Ginac have a test suite?

Some of the files under the check/ directory seems to verify
correctness.

>   3. does ginac have a bug tracker?
>   4. what does Valgrind say about the quality of the code when run on
> the test suite?  memory leaks?

I haven't run it through valgrind at all. Maybe mabshoff can lend a
helping hand here?


Burcin
_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

Re: A bad idea: GiNaC as the symbolic manipulation engine in Sage

by Alexei Sheplyakov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Aug 11, 2008 at 05:09:30PM +0200, Burcin Erocal wrote:

> Are you denying that GiNaC is the best "symbol manipulation engine"
> I can find?

That depends on a definition of a "symbol manipulation engine", i.e. what
kind of features do you expect from it (i.e. GiNaC does not implement
polynomial factorization, which is necessary for symbolic integration).

Also to use GiNaC (or Maxima) in an efficient manner you need to forget
your Mathematica (Maple, whatever) experience and learn from scratch.
 
> Your statements also seem to suggest that there is something wrong with
> interactive use.

GiNaC is not designed for this kind of use. Interactive manipulation is kind
of pointless if  the expressions in question consists of 10^6 -- 10^8
of terms, isn't it?

> OK, so I chose a bad example for a benchmark, and this was pointed out
> before.

I think the benchmark is OK. My point is a bit different: the method you've
used to construct the sum is very inefficient, at least for GiNaC and Maxima.
Apparently it's also inefficient for Mathematica (I guess it's also inefficient
for almost any CAS too, but I can't tell for sure):

$ cat mma_bench1.m

A = 0;
For[i = 0, i <= 10000, i = i + 1, A = A + (x+y)^i; ];

$ time math < mma_bench.m >/dev/null

real    0m15.535s
user    0m15.509s
sys     0m0.016s

Hence Mathematica provides a more efficient method, it's called Sum:

$ cat mma_bench2.m
A = Sum[(x+y)^i, {i, 0, 10000}];

$ time math < mma_bench2.m >/dev/null

real    0m0.035s
user    0m0.028s
sys     0m0.004s

The point is: although both methods are correct, the non-obvious one
performs _3 orders of magnitude_ better. Likewise, GiNaC (and Maxima)
also provide efficient methods for doing some operations. You might want
to learn them before doing some changes to data structures.

> When I said modifying data structures, I meant that you can keep the
> arguments of an operator in a different structure, instead of a list.

I understand (BTW GiNaC uses arrays for that, not lists).

> This would probably only make sense on very large expressions, and at
> this point, I don't know what the penalty for smaller expressions would
> be.

I don't agree. I think the large expression should be kept as flat as
possible for (at least) two reasons:

1. Otherwise eval() and other inherently recursive functions/methods
   would run out of stack.
2. Any other structure requires more memory to store the same terms.
 
> As for different ways of storing the arguments of an operator, we can
> look at methods for storing and calculating with polynomials. One
> method is to use a hash table to keep the monomials

The same method was used in ancient versions of GiNaC. That was a bad idea.
It makes code much more complicated and does not really improve performance
(in some situation it's even worse).

N.B. You might want to grep GiNaC source for EXPAIRSEQ_USE_HASHTAB.

To reiterate, a flat N-ary tree (which is what GiNaC::expairseq
essentially is) seems to be the best way to store big expressions.
 
> I got the above timing on this machine:
>
> $ cat /proc/cpuinfo  | grep name
> model name      : Intel(R) Pentium(R) D CPU 3.40GHz

Although the build time depends on optimization options and a number of other
factors, 25 minutes is way too long for such a box (even if you build both
static and shared libraries). Unless you use some insane optimization options,
this looks like a compiler and/or linker bug (see e.g.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=376066). What are exactly
the compiler and binutils versions (as in g++ --version; ld --version)?

Anyway, there are a number of ways to speed up the build. Jens already
mentioned --disable-static. Another useful option is to run build in
a parallel manner, i.e. invoking make as

make -j N

where N \approx number of CPU cores + 1, and appending '-pipe' option to
CXXFLAGS. Note: this is useful even if your CPU is single core, because
CLN consists of lots of small source files.

Yet another method is to use a fast(er) shell, such as ash or pdksh, i.e.

SHELL=/bin/ash CONFIG_SHELL=/bin/ash /bin/ash configure --disable-static

This works because GNU libtool (which is responsible for building GiNaC
and CLN, as well as a number of other Free and even not so free libraries)
is a shell script. Of course, ash is not suitable as an interactive shell,
but it's very good at parsing/running big scripts (such as GNU libtool).

@developers: libtool version 2.2 is reportedly much faster than 1.5
(which is what we use), so we might want to upgrade to that version
when releasing next version of GiNaC.


Best regards,
        Alexei

--
All science is either physics or stamp collecting.



_______________________________________________
GiNaC-list mailing list
GiNaC-list@...
https://www.cebix.net/mailman/listinfo/ginac-list

signature.asc (844 bytes) Download Attachment