Intellasys question for Jeff Fox

View: New views
13 Messages — Rating Filter:   Alert me  

Intellasys question for Jeff Fox

by nakedtruth :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Jeff, at the Intellasys message board someone asked about transcendental functions
for the SEAForth chips.  I recall you had an implementation of CORDIC for the F21.
Wouldn't that be pretty straightforward to port?

Regards,

John M. Drake


     

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Re: Intellasys question for Jeff Fox

by Jeff Fox-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dr. Monvelishsky wrote a CORDIC function in machineforth
for our ANS P21forth in 1992 which also ran on F21 with
a small mod.  I recall that I was impressed with the fact
that p21 running Michael's code was 50x faster than Intel's
387 coprocessor on these transcendental functions at the
time. Michael wrote a CORDIC for SEAforth so long ago that
it probably needs to be updated now.  Such things will be
included when more libraries are published.

Best Wishes


> Jeff, at the Intellasys message board someone asked about transcendental
> functions
> for the SEAForth chips.  I recall you had an implementation of CORDIC for
> the F21.
> Wouldn't that be pretty straightforward to port?
>
> Regards,
>
> John M. Drake
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: colorforth-unsubscribe@...
> For additional commands, e-mail: colorforth-help@...
> Main web page - http://www.colorforth.com
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Re: Intellasys question for Jeff Fox

by Albert van der Horst :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, May 23, 2008 at 05:21:14PM -0700, Jeff Fox wrote:
> Dr. Monvelishsky wrote a CORDIC function in machineforth
> for our ANS P21forth in 1992 which also ran on F21 with
> a small mod.  I recall that I was impressed with the fact
> that p21 running Michael's code was 50x faster than Intel's
> 387 coprocessor on these transcendental functions at the
> time. Michael wrote a CORDIC for SEAforth so long ago that
> it probably needs to be updated now.  Such things will be
> included when more libraries are published.

That doesn't tell the whole story. Intel's result was for more
than 50 % the nearest presentable float, and never more than
the machine precision (epsilon) off.

So to complete the picture:
        How precise was the cordic?

(Todays Intel's can do a cosine in a couple of cycles of 284 pS.
It is not fair to compare the F21 to that, but indeed the world has
moved on.)

> Best Wishes

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- like all pyramid schemes -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Re: Intellasys question for Jeff Fox

by Jeff Fox-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> On Fri, May 23, 2008 at 05:21:14PM -0700, Jeff Fox wrote:
>> Dr. Monvelishsky wrote a CORDIC function in machineforth
>> for our ANS P21forth in 1992 which also ran on F21 with
>> a small mod.  I recall that I was impressed with the fact
>> that p21 running Michael's code was 50x faster than Intel's
>> 387 coprocessor on these transcendental functions at the
>> time. Michael wrote a CORDIC for SEAforth so long ago that
>> it probably needs to be updated now.  Such things will be
>> included when more libraries are published.
>
> That doesn't tell the whole story. Intel's result was for more
> than 50 % the nearest presentable float, and never more than
> the machine precision (epsilon) off.

You are quite correct that one sentence is not the whole
story. As always there are more details and the most
important is not raw performance but performance divided
by cost or performance divided by power consumption as
that was the intended target.

In this case both machines provide machine precision.
One can say they are equal in that sense and in this
case we compared on the same calculation. If you need
higher precision results it is easier to get them with
the wider bus, but cordic can calcute to arbitrary
precision.  But it becomes less efficient if you have to
do multiprecision math.

Of course F21 was designed for parallelism and a fair
comparison on power or cost or transitor count means
we should be comparing 100 F21 to a 386/387 combination.
So the performance ration raises to about 10000x on
the lower precision calculation when you level the
playing field and certainly that is the most important
part of the whole story.

It is a little like comparing C18 to Pentimum.  Since
Pentium don't cost a couple of cents or draw only mw
of power at full throttle a one to one comparison
makes little sense.  Pentium can't do what c18 does which
is match solutions wanting only a few cents or a few
mw of power.  A single processor costing a couple
of cents isn't likely to outperform processors
costing thousands of times more.  For a direct
chip to chip comparison one should probably pick a
chip the size and cost of c18 and note that you
might see a 30,000/1 performance difference. That
kind of comparison makes more sense since no one
is likely to swap out a Pentium for a really small
and cheap chip. If they can they have really been
wasting a lot of money and power.

For large scale performance comparisons where one is
going to spend pentium scale budgets for cost or
power we should be comaring a Pentium to 100 100x
clusters chips connected together, you know, stuff
in the millions of mips performance range.  That's
what the scalable in Scalable Embedded Arrays is
about.

> So to complete the picture:
>         How precise was the cordic?

Machine precision in each case.  On on the lower
precision calculation which is what had been asked about,
the important ratio is not the 50/1 for a single P21 but
I think a comparison on a more level playing field is
appropriate.  An p21, f21 or c18 was not meant to be
a big Intel chip designed for C code, they are designed
for realtime low-power efficient code.

If we are comparing differences we could also note
that with FP there are precision problems and errors
due to rounding.  Chuck often talks about how he
prefers CAD calculations that get more accurate results
than the popular floating point calculation methods.

Of course those tiny chips can already beat a Pentium very
badly not only on performance/cost or performance/power
but in raw performance when comparing on realtime
response which Pentium was not designed to do and which
the small chips were.  Pentium's deep pipelines and
multi-layered cache become a nightmare when trying to
meet realtime performance requirements that are quite
easy to meet with a processor costing a few cents.

> (Todays Intel's can do a cosine in a couple of cycles of 284 pS.
> It is not fair to compare the F21 to that, but indeed the world has
> moved on.)

Agreed.  It isn't fair to compare an <$1 early 90s embedded chip
needing only a couple of milliwatts to a anything being made
today because the world has moved on and it would be as unfair
to compare today's Intel chips to antique Forth chips as it
would be to compare today's Forth chips to antique Intel chips.
Anyone doing that would just be trying to infer that they had
moved on and that Intel was still stuck in the 80s (making 8051)

I have no idea why anyone would suggest a comparison of an
early 90s chips to 'todays Intel's' unless they are just
trying their best to be insulting and rude.

There is nothing wrong with the folks who prefer to work
with antique computers or antique language dialects.  But
most people have moved on from 80s chips and 70s dialects
except for a new notable nostalgists.

I was reporting that Intel's chips of that day had been
compared to Forth chips from the same time doing the same
calculation.  There are many other comparisons that could
be made to get the whole story.

For a fair comparison today you need to compare today's
chips to today's chips which is what we do today.  To
level the playing field we would need to scale quite
a bit to get to the level of machines people talk about
in c.l.f.  At the same level of cost or power consumption
as Pentium PC we need to talk about something like a
7,000,000 Forth MIPS PC.  The world has moved on since
FIG-Forth and transputers.

But even MIPS numbers alone are different things.  On Pentium
MIPS are associated with large scale large memory number
crunching.  They don't translate well into realtime
performance because of the cache and pipeline issues.
And since the Forth chips were not designed for that it
makes little sense to say one should compare on that
metric.  Though when we expand to Pentium budgets and
scale up to thousands of processors it will be intersting
to see where we sit with things like millions of mips
raw processing power.  We know we are very well matched
to cad calculations for instance.

When not scaled up to Pentium level systems he Forth chips
were designed for embedded realtime systems where power
and cost are critical.  Since Pentium is not designed to
do that it doesn't make much sense to compare on the
metric for Forth chips either.  Pentium looses so badly
there that is just isn't fair at all.

I found an interesting paper from Parallax benchmarking
an interface to one of their chips on Intel PC.  This was
an SPI interface bit-bang and since we have one of those
in a C18 ROM I thought it would make an interesting
comparison.  The raw performance ratios (not counting
the 1000/1 cost and power ratios) was still 100/1 and
not in favor of the Pentium.  The kind of mips on
Forth chips translate into things like faster control
loops on realtime applications because of the lack
of pipeline and cache problems.

But that's not the 'whole story' either. Including
the 'whole story' about Pentium will always take
10,000 pages of explanation about almost anything.
That's why almost no one claims to understand how
to optimize code for Pentium without a lot of
experimentation and code profiling.

In this case it was not just the problems with
pipelines and caches and parallel ports that limited
the PC performance but also the software that gets
loaded on Pentium PC that make it even more difficult
to get decent realtime numbers even on simple things
that simple and cheap processors do quite well.

I look forward to release of Chuck's colorforth
software targeting his Forth chip designs.  Colorforth
was made for okad and in my opinion almost all of the
value of colorforth is that it is a few percent of the
code at the bottom of the cad software that makes all
the progress possible.  Offering the SEAforth target
compiler in colorforth will give people a better sense
of the nature of colorforth code.

I also have to admit that there were many things I
liked about working with a large team of chips
designers who used to work at Intel on chips like
Pentium.  It was a lot of fun to talk to them and
learn how differently they think about things than
Forth programmers.

As often observed the Forth code in Forth compilers
isn't really very characteristic of Forth code but
is unfortunately all that a lot of Forth enthusiasts
ever get exposed to.  In my opinion the reason for
all this stuff is solving problems that no one has
solved before. In the case of Forth's intent I think
it has mostly been about pushing the performance/price
or performance/power envelope of applications, and
occasionally about programmer performance.

Best Wishes



---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Re: Intellasys question for Jeff Fox

by Gwenhwyfaer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 24/05/2008, Jeff Fox <fox@...> wrote:
> In this case both machines provide machine precision.

You repeat this later, but I can't help thinking it's an evasion. Most
people, when they want to know precision, are thinking of a number of
binary digits. (And a word about accuracy wouldn't go amiss either.)
Since the F21 had a 21-bit word, would it be fair to assume that the
precision was 20-bit? What about accuracy?

> Of course F21 was designed for parallelism and a fair
> comparison on power or cost or transitor count means
> we should be comparing 100 F21 to a 386/387 combination.

What was the cheapest that single F21 chips were ever available?

> It is a little like comparing C18 to Pentium.  Since
> Pentium don't cost a couple of cents

But where can I buy a C18 for a couple of cents? It's an unrealistic
figure; even if they were packaged singly, and the C18 did cost a
couple of cents to fabricate, the chip packaging alone would
completely dominate the cost. (The same must now be true for PIC16s,
given how long the architecture has been kicking around - but even the
thoroughly obsolete PIC16F54, the cheapest processor Microchip do when
packaged in an 18-pin SOIC, is still 37c qty. 10k... or 48c qty. 4; 3
or less you might as well get via the sample program. I'd be amazed if
less than 98% of that was the cost of the carrier.) For a fairer
comparison, how much is a SEAforth-24A processor in various quantity
levels? And what proportion of that cost is packaging?

And comparisons with the Pentium are of course silly - but comparisons
with, say, the Cortex M3 (Luminary's implementation starts from $1 in
quantity) might be more realistic. And sure, the C-M3 doesn't come
close to Pentium performance - 62 Dhrystone MIPS (for what that's
worth) at 50MHz, I believe - and its raw instruction throughput isn't
as high as the C18's. But its programming model is a lot easier to
work with; and of course, Thumb-2 instructions *do* more than C18
instructions. Not as much as ARM instructions, admittedly; that's the
difference between an instruction set designed to be an efficient (in
both space and time) compilation target for C (Thumb), and an
instruction set designed to be a joy to program in assembler (ARM).
Still, it's a fair comparison - and historically ARM cores have been
pretty tiny (74k transistors for the ARM7TDMI, 112k for the ARM9TDMI).
And certainly in budgetary terms the C-M3 is a much better comparison
for the C18; it'd be interesting to see real comparative benchmarks.

In fact (and forgive the wandering off topic here), here's a
suggestion for an interesting benchmark - the number of voices of
MIDI-driven OPL2-style FM synthesis (at a 48k sample rate) that each
chip can perform, complete with a subjective audio quality comparison.
It's a nice realtime app; the specifications are fixed, well known,
and quite implementation-independent; it doesn't need multiplication
or large amounts of memory, but it can take advantage of it if it's
there; the clock required for sample output has the potential to test
interrupt latency; you end up with a nice little figure at the end of
it; it scales down to the lowest PICs (which may manage to get 1 voice
out, but not much more) and up to the scary fast GPUs nVidia are
producing these days (millions of voices! eek!); it can be implemented
easily in assembler, C or Forth; and it would provide anyone
interested in synthesis with a ready-made demo app.

> An p21, f21 or c18 was not meant to be
> a big Intel chip designed for C code, they are designed
> for realtime low-power efficient code.

Believe me, the Intel chips aren't designed for C code either. The
great gcc v x86 battles of the past and present bear witness to that!
In fact, at the time of the 286, Intel were producing the iAPX432,
which was intended to be directly programmable in Ada (of all things!)
and whose design had a heavy influence on the shape of 286 protected
mode - it wasn't until well after the 386's release that C could be
seen to have predominated.

> If we are comparing differences we could also note
> that with FP there are precision problems and errors
> due to rounding.

They are known, though, and not entirely unmanageable; and since the
x87 works with 80-bit floats by default (sufficient to retain 64 bits
of integer precision at all times, which matches the "long multiply"
of most 32-bit processors) they're much less of a factor than
denormalisation (which is generally disastrous for the x87's
performance).

Unfortunately, SSE is the future, and its maximum integer precision (I
believe; someone correct me if I'm mistaken) is 53 bits. Embrace the
regression :/ On the other hand, now that the P4 core is consigned to
history, might the x87 not be rehabilitated in the (PPro-derived) Core
2?

(http://docs.sun.com/source/806-3568/ncg_goldberg.html is a useful
reference. Floating point can be a very useful tool; particularly when
- as in the x86 - it's just there anyway, and you couldn't turn it off
if you wanted to.)

> Chuck often talks about how he
> prefers CAD calculations that get more accurate results
> than the popular floating point calculation methods.

Well, when you've carefully scaled all your quantities to be extremely
amenable to simple integer manipulation without precision loss, it's
not surprising that they are more accurate than floating-point
calculation with real-world units (which have a tendency to be
inconvenient and irrational).

> Agreed.  It isn't fair to compare an <$1 early 90s embedded chip
> needing only a couple of milliwatts to a anything being made
> today

Nor is it fair, in timeframe terms, to compare a 386/387 combination
to an F21, when the 486 had been the current x86 generation since
1989. The Pentium (P5) was introduced in 1993, too, so in terms of
timeframe, an F21 v P5 comparison isn't completely unreasonable.

Of course, there are whole hosts of other reasons why such a
comparison is completely unreasonable - it's just that release date
isn't one of them.

> There is nothing wrong with the folks who prefer to work
> with antique computers or antique language dialects.  But
> most people have moved on from 80s chips and 70s dialects
> except for a new notable nostalgists.

But the architecture of Chuck's chips has, if anything, become more
constricted since the F21 days. 20 bits of external bus (and 21 of
internal) have been shorn to 18 bits of each, and the ability to talk
to external RAM seems to have disappeared.

> The world has moved on since FIG-Forth and transputers.

You know, it actually hasn't; it's now turning out today that the
transputer was bang on target - just about 25 years too early... and I
think people yearn for the days of simple models like figForth /
Forth-79 or BBC / Applesoft Basic.

> But even MIPS numbers alone are different things.

Millions of Instructions Per Second... not that different, on the face
of it. Of course, the problem is that "Instruction" is a wildly
divergent concept, not remotely comparable between CPUs. :) Hence the
quest for "universal apples" - Dhrystone isn't much cop, but it
appears to be the best available (but see above).

> That's why almost no one claims to understand how
> to optimize code for Pentium without a lot of
> experimentation and code profiling.

Except Agner Fog. I think everyone leaves the heavy lifting to him
these days. ;) In any case, perhaps the single most important piece of
optimisation advice (after "if it's more than O(n log n) check you
haven't screwed up") that everyone needs to know about any modern CPU
is "make sure all your inner loops stay in L1 cache; make sure as much
of your data as possible is in L1 cache before using it".

> In the case of Forth's intent I think
> it has mostly been about pushing the performance/price
> or performance/power envelope of applications,

And possibly the distinction between high and low level.

> and occasionally about programmer performance.

Not necessarily more so than anything else, though; the kind of
programmer who can naturally bend their mind to the C18 will probably
have just as much of a field day in the hidden corners of the ARM
instruction set - or even the x86 one (look at Chuck's approach,
mining all the 1-byte microcoded instructions that nobody ever
generates, because they squish much more neatly into L1 cache...
unless you have a P4, of course).

Unfortunately, nobody seems to want that kind of programmer any more -
which leaves me (for one) out of a job, and increasingly alienated
from the field I trained in. Mneh.

Regards
Gwenhwyfaer  (... all job offers gratefully accepted ;)

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Parent Message unknown Re: Intellasys question for Jeff Fox

by nakedtruth :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



----- Original Message ----
From: "gwenhwyfaer@..." <gwenhwyfaer@...>
To: colorforth@...
Sent: Saturday, May 24, 2008 1:32:32 PM
Subject: Re: [colorforth] Intellasys question for Jeff Fox

On 24/05/2008, Jeff Fox <fox@...> wrote:
>> In this case both machines provide machine precision.

> You repeat this later, but I can't help thinking it's an evasion. Most
> people, when they want to know precision, are thinking of a number of
> binary digits. (And a word about accuracy wouldn't go amiss either.)
> Since the F21 had a 21-bit word, would it be fair to assume that the
> precision was 20-bit? What about accuracy?

Nonsense.  Machine precision for a 386 is obviously up to 32 bits.
Machine precision for an F21 would be 20 bits.  No "evasion".
The key issue for the sake of comparison is if any the precision
requirements would be such as to cause you to need to do
double word operations on the F21.  If all you need is say 16
bits than that could be done on the F21 the 386 or even the
C18.  For the record a recent (2 years ago) 16 bit implementation
of CORDIC was posted on comp.lang.forth.

http://www.complang.tuwien.ac.at/forth/programs/cordic.fs

So if anyone REALLY cares about benchmarks (I don't) they
can run this on their favorite Intel chip and look at the comparisons
for the c18 whenever that library is released.

>> Of course F21 was designed for parallelism and a fair
>> comparison on power or cost or transitor count means
>> we should be comparing 100 F21 to a 386/387 combination.

> What was the cheapest that single F21 chips were ever available?

You're missing the point.  Jeff isn't talking about OTS cost.
He's talking about fab costs as in how much it would cost
to fab X number of 386s to how much it would cost to fab
X number of F21s.  Since nobody makes 386s anymore
it would technically cost more to fab them today then it
would to buy quad 4 Pentiums but that's based on
economy of scale rather than cost in silicon.  

>> It is a little like comparing C18 to Pentium.  Since
>> Pentium don't cost a couple of cents

> But where can I buy a C18 for a couple of cents? It's an unrealistic
> figure; even if they were packaged singly, and the C18 did cost a
> couple of cents to fabricate, the chip packaging alone would
> completely dominate the cost.

That's part of the reason why multiple C18 cores are put on
the same chip.

Regards,

John M. Drake


     

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Re: Intellasys question for Jeff Fox

by Jeff Fox-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

gwenhwyfaer wrote:
> What was the cheapest that single F21 chips were ever available?

$0.00

> But where can I buy a C18 for a couple of cents?

IntellaSys has been and will be giving away chips with
many c18 on them on flash drives with free development
systems.

> It's an unrealistic figure;

You are uniformed. I was understating but can't tell you
by how much.

> In fact (and forgive the wandering off topic here), here's a
> suggestion for an interesting benchmark - the number of voices of
> MIDI-driven OPL2-style FM synthesis (at a 48k sample rate) that each
> chip can perform, complete with a subjective audio quality comparison.
> It's a nice realtime app; the specifications are fixed, well known,
> and quite implementation-independent; it doesn't need multiplication
> or large amounts of memory, but it can take advantage of it if it's
> there; the clock required for sample output has the potential to test
> interrupt latency; you end up with a nice little figure at the end of
> it; it scales down to the lowest PICs (which may manage to get 1 voice
> out, but not much more) and up to the scary fast GPUs nVidia are
> producing these days (millions of voices! eek!); it can be implemented
> easily in assembler, C or Forth; and it would provide anyone
> interested in synthesis with a ready-made demo app.

Wasn't the 96k sample MIDI-driven FM synthesis and waveguide
synthesis and more that has been demonstrated before and where
voices were compared to Pentium and custom chips where people
could do subjective audio quality comparisons sufficient to
figure where chips that didn't beat Pentium would
fit in a comparison?

>> An p21, f21 or c18 was not meant to be
>> a big Intel chip designed for C code, they are designed
>> for realtime low-power efficient code.
>
> Believe me, the Intel chips aren't designed for C code either.

I believed Intel press releases when they started saying they
were optimizing performance for popular C software starting around
the time of 486 and beyond.  And I believed the chip designers at
IntellaSys who had previously worked at Intel designing Pentium
on that subject.

> The
> great gcc v x86 battles of the past and present bear witness to that!

Those only bear witness to the fact that Intel also needed to
be backwards, backwards compatible with x86 legacy along with Cish.
Cish only makes gccers happier.

> Nor is it fair, in timeframe terms, to compare a 386/387 combination
> to an F21, when the 486 had been the current x86 generation since
> 1989. The Pentium (P5) was introduced in 1993, too, so in terms of
> timeframe, an F21 v P5 comparison isn't completely unreasonable.

Compters that were side by side in time were compared. Chuck and
I had 386 with p21 and 486/Pentium in i21 and f21 days but f21
clock was two to four times faster than my Pentium's clock.
I still have that Pentium laptop and it still has F21 software
on it.  I think it still boots up and runs.

> and the ability to talk
> to external RAM seems to have disappeared.

Most of the SEAforth pins are labeled external memory
interface.  Early prototypes had external ram server
software in rom but unless you use the specific rams
supported by those specific drivers those roms were
wasted. With the external ram server in internal ram
one may configure for a wide range of external memory
interfaces.

> Unfortunately, nobody seems to want that kind of programmer any more -
> which leaves me (for one) out of a job, and increasingly alienated
> from the field I trained in. Mneh.

I think nobody goes too far. But as a trend it is
seen as decreasing. But that's the nature of most
anything one trains for, in time those skills will
be seen as old fashion and adaptation is required.

I know parallelism is on the way in not out.  Forth
chips have been around for along time with only modestly
millions of units quanitites.  Perhaps a large boost
in performance/cost or performance/price or the
combination with parallelism will see Forth chips
more on their way in than out.  Everything is a
gamble.

There has been a lot of positive interest into
IntellaSys technology.  There is also a lot of
apprehension in general to the new parallelism trend
as there has always been to anything new.  But
the folks work with it think it is fun.

Best Wishes



---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Re: Intellasys question for Jeff Fox

by Gwenhwyfaer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 26/05/2008, John Drake <jmdrake_98@...> wrote:
>> You repeat this later, but I can't help thinking it's an evasion. Most
>> people, when they want to know precision, are thinking of a number of
>> binary digits. (And a word about accuracy wouldn't go amiss either.)
>> Since the F21 had a 21-bit word, would it be fair to assume that the
>> precision was 20-bit? What about accuracy?
>
> Nonsense.

That's just plain rude.

> Machine precision for a 386 is obviously up to 32 bits.

Especially since this is just plain wrong. Jeff cited a 386/387 combination, which implies the use of the 387's transcendental instructions; the 387's mantissa precision is 65 bits, but because it's floating point, its range is a lot larger. And maybe most of that precision is unnecessary - I won't argue that if you don't need any more than 20-bit precision, the x87 is a waste of silicon - but you get it for free, and if you need it it's already there.

> Machine precision for an F21 would be 20 bits.  No "evasion".

And no apples-to-apples comparison either. That's where the evasiveness comes in; 20-bit CORDIC on an F21 might well be 50 times the speed of a 387, but what about one that matches the 387's precision? What about the 487, which was actually contemporaneous? What about an integer-only implementation for the 486?

The only way the figure is meaningful is as a comparison of two different styles of programming - the style of carefully rethinking everything from the ground up and only implementing what you need in precisely the way you need it, compared with the style of starting with as rich an environment as possible and hoping whoever implemented it didn't screw up. As comparisons of technology, they're almost meaningless without extensive commentary - a point I made in my previous email.

(It is possible that I'm actually in what has come to be termed "violent agreement" with Jeff, or even yourself, on this point. But the fact is that stating the 50:1 ratio as a comparison in itself, without making this clear, has been fuelling the likes of John Passaniti for the last decade; it's just too depressing to go to c.l.f and see Jeff and John still engaged in the same flamewar I found them fighting 15 years ago. It is transparently clear that the F21 and the 386, or the C18 and the Core 2, will not do anywhere close to the same things at all; it is also clear that if your application will fit inside a C18, a Core 2 will be a waste of money - but that if your application doesn't need the speed of a C18, a PIC may be a better bet; it may well be the case that if your application can use the C18's raw speed - but let's not forget that 1000 MIPS isn't anything special these days, and in any case that's a theoretical peak for a C18, not a much more useful sustained rate figure - a Core 2 would be quite inadequate for the task. Perhaps the only appropriate comparison for a C18, for those of us without fab access, would be 1/24 of whichever Xilinx or Altera FPGA is closest to the SEAforth-24A in unit cost.)

> For the record a recent (2 years ago) 16 bit implementation
> of CORDIC was posted on comp.lang.forth.
>
> http://www.complang.tuwien.ac.at/forth/programs/cordic.fs

That's great, but a 16-word lookup table is a significant chunk of a 64-word memory. Moreover, the great advantage of CORDIC is that it doesn't need multiplication - but if you have a single-cycle multiply and a decent amount of memory, you can use a lookup table and linear interpolation and reduce the cycle count dramatically.

> So if anyone REALLY cares about benchmarks (I don't)

That's nice for you, but people deciding which chip to select for their new application need some way of determining whether or not that application can even be executed by that chip. Only hobbyists can afford to start with the processor and build outwards; not even Chuck has done that - his processors are built to fit their intended applications.

>> What was the cheapest that single F21 chips were ever available?
>
> You're missing the point.  Jeff isn't talking about OTS cost.

Then Jeff isn't talking about anything useful to anyone without their own fabrication facilities. (Do you have your own fab?) If I were looking for a chip to use in my design, I'd be looking at unit costs; a fab cost of "a couple of cents" is pointless to discuss when the unit cost is $40 (or even $4).

> He's talking about fab costs as in how much it would cost
> to fab X number of 386s to how much it would cost to fab
> X number of F21s.

The cost of fabbing any given wafer at any given resolution, as I understand it, is pretty much constant - and quite independent of what's on it. The smaller a chip, the more you can pack onto a wafer, and the more will survive the wafer's inevitable flaws. (Of course, you also need to multiply the cost of a wafer by the number of wafers you blow completely with non-functional designs... but that's not a fab cost per se.)

>> But where can I buy a C18 for a couple of cents? It's an unrealistic
>> figure; even if they were packaged singly, and the C18 did cost a
>> couple of cents to fabricate, the chip packaging alone would
>> completely dominate the cost.
>
> That's part of the reason why multiple C18 cores are put on
> the same chip.

...which as I mentioned previously, will *still* be dominated by packaging costs - that and the cost of custom fabbing, of course, and the need to return profit from comparatively low volume. But for anyone who's reliant on Intellasys to sell them C18s in whatever form, talking about anything other than the cost of a production chip in quantity is worthless; the only people for whom fab costs are useful are people with their own fabrication facilities - and even then, they will still need to insert those chips in carriers.

Regards
Gwenhwyfaer

Parent Message unknown Re: Intellasys question for Jeff Fox

by nakedtruth :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



----- Original Message ----
From: "gwenhwyfaer@..." <gwenhwyfaer@...>
To: colorforth@...
Sent: Monday, May 26, 2008 12:25:24 PM
Subject: Re: [colorforth] Intellasys question for Jeff Fox

On 26/05/2008, John Drake <jmdrake_98@...> wrote:

>> Nonsense.

> That's just plain rude.

You accuse someone else of being evasive and you have the
nerve to call me rude?  More nonsense.  And you haven't seen
me be rude.  I called what you said nonsense and in my opinion
(still) it was.  I see no reason to take that personally.  I sometimes
say things that are nonsense too.

>> Machine precision for a 386 is obviously up to 32 bits.

> Especially since this is just plain wrong. Jeff cited a 386/387 combination, which implies the use of the 387's transcendental instructions;

The CORDIC algorithm doesn't use transcendental instructions.  Now true Jeff mention the 387
but he said nothing about transcendentals so you just made an assumption.  I can see how you
made that assumption based off Jeff's post, but what he says on his website looks like he's
comparing CORDIC implementations.

From Jeff's website:
==============================================================================================
Dr. Montvelishsky is currently doing work for Ultra Technology.  He has
submitted several routines for the MuP21 and F21 microprocessors under
development at Computer Cowboys by Charles Moore.  Dr. Montvelishsky's
CORDIC function executes 50 times faster on MuP21 than on a 387.  Here at
Ultra Technology Dr. Montvelishsky will be consulting on the design of
parallel Forth extensions for the F21 Parallel Forth engine, robotics,
scientific programming, and on using Forth to teach computer science.
==============================================================================================
>
>> Machine precision for an F21 would be 20 bits.  No "evasion".

> And no apples-to-apples comparison either. That's where the evasiveness comes in; 20-bit CORDIC on an F21 might well be 50 times the speed of a 387, > but what about one that matches the 387's precision? What about the 487, which was actually contemporaneous? What about an integer-only
> implementation for the 486?

What about it?  I posted a link to a integer Forth implementation of CORDIC for gforth.  Feel free to implement it and run whatever timing
you like.  Feel free to implement CORDIC on a SEAForth simulator and check the timings since you're so interested in this.

> The only way the figure is meaningful is as a comparison of two different styles of programming - the style of carefully rethinking everything
> from the ground up and only implementing what you need in precisely the way you need it, compared with the style of starting with as rich
> an environment as possible and hoping whoever implemented it didn't screw up. As comparisons of technology, they're almost meaningless
> without extensive commentary - a point I made in my previous email.

Yes.  Wonderful extensive commentary.  Good for you.  All of this over two chips that at this point aren't in production.  You're
looking back I'm looking forward.  Great.  Wonderful.  It takes all kinds.

> (It is possible that I'm actually in what has come to be termed "violent agreement" with Jeff, or even yourself, on this point. But the fact is that stating the > 50:1 ratio as a comparison in itself, without making this clear, has been fuelling the likes of John Passaniti for the last decade; it's just too depressing to > go to c.l.f and see Jeff and John still engaged in the same flamewar I found them fighting 15 years ago. It is transparently clear that the F21 and the 386, > or the C18 and the Core 2, will not do anywhere close to the same things at all; it is also clear that if your application will fit inside a C18, a Core 2 will be > a waste of money - but that if your application doesn't need the speed of a C18, a PIC may be a better bet; it may well be the case that if your application > can use the C18's raw speed - but let's not forget that 1000 MIPS isn't anything special these days, and in any case that's a theoretical peak for a C18,
> not a much more useful sustained rate figure - a Core 2 would be quite inadequate for the task. Perhaps the only appropriate comparison for a C18, for
> those of us without fab access, would be 1/24 of whichever Xilinx or Altera FPGA is closest to the SEAforth-24A in unit cost.)

The only thing I've learned from John Passiniti's commentary over the years is to ignore John Passiniti's commentary.  He and others at comp.lang.forth
are often spoiling for a fight.  It doesn't matter what you're trying to say.  Case in point was when I pointed out that you can do "Hello World" in
ColorForth as easily as you can in any other Forth.  Yes it requires first extending ColorForth but then you have several reusable useful words.
That's what Forth is supposed to be about.  But some of the "gurus" have nothing better in life than to try to tear down anything that isn't
ANS Forth or some variant.  I've learned to forget about them and if you want peace of mind then my suggestion is to do the same.
No matter what interesting result you have someone will try to find some way to minimize it.  That's just life.


>> For the record a recent (2 years ago) 16 bit implementation
>> of CORDIC was posted on comp.lang.forth.
>>
>> http://www.complang.tuwien.ac.at/forth/programs/cordic.fs

> That's great, but a 16-word lookup table is a significant chunk of a 64-word memory. Moreover, the great advantage of CORDIC is that it doesn't need
> multiplication - but if you have a single-cycle multiply and a decent amount of memory, you can use a lookup table and linear interpolation and reduce the > cycle count dramatically.

I never suggested that you use this ANS Forth implementation for the SEAForth and that's not the reason I posed it.  But this will give you the
"apples to apples" comparison that you want so badly (and least the Pentium apple).  All that's left is to implement you own SEAForth
variant or wait for the library release.  As for any optimizations you might want to do to the code, remember it's already been optimized
by one of the c.l.f. "gurus". *wink*  So if you're really intent on impressing the c.l.f. crowd......

>> So if anyone REALLY cares about benchmarks (I don't)

> That's nice for you, but people deciding which chip to select for their new application need some way of determining whether or not that application can
> even be executed by that chip. Only hobbyists can afford to start with the processor and build outwards; not even Chuck has done that - his processors
> are built to fit their intended applications.

I will apologize in advance for offending you, but the above in nonsense.  I never said I didn't care if an application can "even be executed".  In
fact if you were paying attention to the original post you'd note that I started this thread specifically to find out what steps could be taken
to help someone else who had asked about getting transcendentals for the SEAForth.  In other words I was trying to help someone
"determine whether or not that application can even be executed by that chip."  And Jeff has answered the question to the affirmative.
Now we've gotten sidetracked on whether or not you can compare an F21 to a 386/387 and frankly I don't find that interesting but
I'm glad you do.

>>> What was the cheapest that single F21 chips were ever available?
>>
>> You're missing the point.  Jeff isn't talking about OTS cost.

> Then Jeff isn't talking about anything useful to anyone without their own fabrication facilities. (Do you have your own fab?) If I were looking for a chip to
> use in my design, I'd be looking at unit costs; a fab cost of "a couple of cents" is pointless to discuss when the unit cost is $40 (or even $4).

Are you considering using an F21 or a 386/387 in your design?  If you're seriously considering 386/387 chips I would suggest going to
your local computer scrapyard (although today I think all you'd find are Pentiums of some kind or another.)  If you are considering using
an F21 then those chips are for the most part no longer available.  And no I don't own a fab.  I don't know if Jeff owns one but he certainly
didn't when the F21s were manufactured.  If you want to know how to manufacture chips without owning a fab it's called using OPF.
(Other people's fabs.)

I forget who manufactured the F21 but I recall looking up the costs.  The more chips you manufactured the cheaper per unit it was.
Now I realize I'm belaboring the obvious but you're dragging this out for some reason.  If you wanted genuine 386/387 chips for
some reason it would cost you more per unit to fab them then it would the F21 equivalent.  (I don't think you can buy 386s
directly from Intel.  There is a company that makes all sorts of old chips even to the old "Z80" but they don't list their prices.)

Now if you are serious about comparing "apples to apples" you could ask how many FPGA gates it would take to make
an F21 equivalent to a 386/387 equivalent.  That's not a question I care about but someone else might find it interesting.

Regards,

John M. Drake



     

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Re: Intellasys question for Jeff Fox

by Gwenhwyfaer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

John, I'm afraid that at this point I have no idea what you're talking
about or who you're arguing with - because as far as I can see you're
not actually responding to what I wrote at all. That being the case,
there is also nothing I could possibly say which would actually have
any effect on the image of me with which you are so intent on doing
battle - because that image has too little connection with reality.
Maybe that's because I have not expressed myself well; if that's the
case, I apologise. Or maybe it isn't; maybe the fault does not lie
entirely with me.

Whichever the case might be, I don't think there is any point in the
continuation of this correspondence. We're not communicating anyway;
why continue to try?

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Parent Message unknown Re: Intellasys question for Jeff Fox

by nakedtruth :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gwenhwyfaen, Forth should be simple.  So let's simplify the
argument.

For argument's sake let's say you and I are on a design team
for a new widget.  (I know you'd probably rather drink a
sawdust shake, but humor me.)  Let's say that we have
advanced funding to make 10,000 of these widgets.
That's important because a MOSIS fab run (who Jeff
used to fab the F21) has a cheaper per unit cost as the
number of chips being made goes up.  Let's say that
we've already decided that we really only need 16 bits
of precision.  20 bits is overkill.  32 bits is WAY overkill.
A 65 bit floating point mantissa is EXTREME overkill.
For whatever reason we need transcendental functions.
Maybe we want simple 3D graphics, or maybe we're
building a GPS widget with onscreen mapping.

Also for the sake of argument let's say it's the mid
1990's and we can still buy 386/387's cheaply off
the shelf say for $20.00.  (I don't recall what the
price of 386/387's fell to before they were finally
phased out.)  When we contact Ultratechnology
and/or iTV and price their chips we are told that
if we buy 10,000 chips we can get them at a
cost of $10.00 per chip.  We also want to make
sure that the chip we use can do the transcendental
functions AT LEAST as fast as a 386/387.  
Please answer the following 2 questions.

Question 1:  Why should I care what the price of
ONE F21 is when we want to purchase TEN
THOUSAND?  Buying ONE F21 doesn't take
advantage of the economy of scale.

Question 2:  Why should I care that the 386/387
is doing the calculations at a higher precision
than the F21 when we've already decided that
we don't need that much precision?

There you go.  Two very simple questions.
Forget John Passiniti or anybody else.

Regards,

John M. Drake

----- Original Message ----
From: "gwenhwyfaer@..." <gwenhwyfaer@...>
To: colorforth@...
Sent: Monday, May 26, 2008 10:23:45 PM
Subject: Re: [colorforth] Intellasys question for Jeff Fox

John, I'm afraid that at this point I have no idea what you're talking
about or who you're arguing with - because as far as I can see you're
not actually responding to what I wrote at all. That being the case,
there is also nothing I could possibly say which would actually have
any effect on the image of me with which you are so intent on doing
battle - because that image has too little connection with reality.
Maybe that's because I have not expressed myself well; if that's the
case, I apologise. Or maybe it isn't; maybe the fault does not lie
entirely with me.

Whichever the case might be, I don't think there is any point in the
continuation of this correspondence. We're not communicating anyway;
why continue to try?

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


     

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Re: Intellasys question for Jeff Fox

by Nick Maroudas-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Re this thread (24 May by gwenhwyfaer & reply 28 May by
JF)  
I would just like to support gwenhwyfaer's interest in
c18 parallel
processor for music synth, and look forward to getting
my hands
on the music demo stick that Jeff mentions.  The offer
of free sample
vs purchase as evaluation board seems to me less
relevant than the
opportunity to increase the number of voices on my CF
synth program.
At present I get about 40 voices with on a 550 MHz
Pentium 4.
I hope to get at least 400 voices on a 3 GHz Dual with a
faster DAC;
cost of new equipment (PC + DAC + PCI card) around $700,
plus
wiring and installation time.  It would certainly have
been attractive
alternative, to simply plug into my old 550 MHz box one
of those
"chips with many c18 on them on flash drives with free
development
systems" that Jeff mentions - even if it were not given
away free
but sold as an eval board like the DAC & PCI above ($200
each).
Intellasys now show a picture of this eval device on
their website, so
I hope it will soon be available to the humble hobbyist.
 Then I
could see how it compares with something the best that I
can get
out of Intel using CF.

This is not a "Question for Jeff Fox" - just a posting
of my own
potential interest in using parallel CF.
     
Caritas,

Nick

****************************

A better world is not only possible but on the way.  On
a quiet day you can hear her breathing.  - Arundhati
Roy

****

---------------------------------------------------------------------
To unsubscribe, e-mail: colorforth-unsubscribe@...
For additional commands, e-mail: colorforth-help@...
Main web page - http://www.colorforth.com


Re: Intellasys question for Jeff Fox

by Nick Maroudas-2 :: Rate this Message: