Concatenative Hardware Redux (Another 16 Instruction Set)

View: New views
11 Messages — Rating Filter:   Alert me  

Concatenative Hardware Redux (Another 16 Instruction Set)

by Christopher Diggins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

Here is my most recent 16 instruction set for concatenative hardware.
Any comments would be greatly appreciated.

/*
 * Released into the public domain by Christopher Diggins
 * Contact me if you have special licensing requirements
 * http://www.cdiggins.com
 */

enum
{
        /* Push a literal value from the instruction stream */
        LIT = 0, /* -> a */

        /* Add top two values on the stack */
        ADD = 1, /* a b -> (a + b) */

        /* Shift second value on the stack left or right */
        SHIFT = 2, /* a b -> b < 32 ? (a >> b) : (a << (b - 32)) */

        /* Perform exclusive or of top two values */
        XOR = 3, /* a b -> (a ^ b) */

        /* Perfrom bitwise and of top two values */
        AND = 4, /* a b -> (a & b) */

        /* Dynamic function/list construction */
        /* Note: lists and functions are identical from a programmer's perspective,
                a list is the result of a function that consumes nothing. Think of it
                as a new stack yielded by executing the function, except that it is
                lazy (evaluated at last minute) */
        PAPPLY = 5,  /* a [X] -> [a X] */
        COMPOSE = 6,  /* [X] [Y] -> [X Y] */

        /* Branching and calling */
        APPLYIF = 7,  /* bool [X] -> X or NOOP */
       
        /* Reference creation and manipulation, used for playing with heap */
        MAKEREF = 8,  /* a -> &a */
        GETREF = 9,  /* &a -> a */
        SETREF = 10, /* &a b -> &(*a = b) */

        /* Stack manipulation operations */
        FLIP = 11, /* a X b n:int -> b X a ; sizeof(X) == n */
        GETN = 12, /* a X n:int -> (a X, a) ; sizeof(X) == n */
        SETN = 13, /* (a X n:int, b) -> b X ; sizeof(X) == n */

        /* Applies a function to a list */
        APPPLYTO = 14, /* [->X] [X->Y] => [Y] */

                /* Make a suggestion ( I was thinking of COUNT, but I
am not so sure ) */
                UNUSED = 15
}
opcodes;

/*
Some examples of derived opcodes:

        SWAP = 0 FLIPN
        DUP = 0 GETN
        POPD = 0 SETN
        POP = SWAP POPD
        QUOTE = [] CONS
        APPLY = TRUE SWAP APPLYIF
        CONS = SWAP QUOTE PAPPLY
        UNCONS = APPLY SWAP
        DIP = SWAP QUOTE COMPOSE APPLY
*/

Christopher Diggins
http://www.cat-language.com

Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by LittleDan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Don't you think basic arithmetical operators would be useful, if this
might eventually be actual and not theoretical hardware? If you only
have and, shift and xor, maybe it's *possible* to implement arithmetic
(I personally don't know how) but it wouldn't be as efficient. There's
no real point in restricting yourself to 16 operations, especially
when some of these operations will encode a literal and some of them
won't.

Dan

On Jan 10, 2008 2:12 PM, Christopher Diggins <cdiggins@...> wrote:

>
>
>
>
>
>
> Hi all,
>
>  Here is my most recent 16 instruction set for concatenative hardware.
>  Any comments would be greatly appreciated.
>
>  /*
>  * Released into the public domain by Christopher Diggins
>  * Contact me if you have special licensing requirements
>  * http://www.cdiggins.com
>  */
>
>  enum
>  {
>  /* Push a literal value from the instruction stream */
>  LIT = 0, /* -> a */
>
>  /* Add top two values on the stack */
>  ADD = 1, /* a b -> (a + b) */
>
>  /* Shift second value on the stack left or right */
>  SHIFT = 2, /* a b -> b < 32 ? (a >> b) : (a << (b - 32)) */
>
>  /* Perform exclusive or of top two values */
>  XOR = 3, /* a b -> (a ^ b) */
>
>  /* Perfrom bitwise and of top two values */
>  AND = 4, /* a b -> (a & b) */
>
>  /* Dynamic function/list construction */
>  /* Note: lists and functions are identical from a programmer's perspective,
>  a list is the result of a function that consumes nothing. Think of it
>  as a new stack yielded by executing the function, except that it is
>  lazy (evaluated at last minute) */
>  PAPPLY = 5, /* a [X] -> [a X] */
>  COMPOSE = 6, /* [X] [Y] -> [X Y] */
>
>  /* Branching and calling */
>  APPLYIF = 7, /* bool [X] -> X or NOOP */
>
>  /* Reference creation and manipulation, used for playing with heap */
>  MAKEREF = 8, /* a -> &a */
>  GETREF = 9, /* &a -> a */
>  SETREF = 10, /* &a b -> &(*a = b) */
>
>  /* Stack manipulation operations */
>  FLIP = 11, /* a X b n:int -> b X a ; sizeof(X) == n */
>  GETN = 12, /* a X n:int -> (a X, a) ; sizeof(X) == n */
>  SETN = 13, /* (a X n:int, b) -> b X ; sizeof(X) == n */
>
>  /* Applies a function to a list */
>  APPPLYTO = 14, /* [->X] [X->Y] => [Y] */
>
>  /* Make a suggestion ( I was thinking of COUNT, but I
>  am not so sure ) */
>  UNUSED = 15
>  }
>  opcodes;
>
>  /*
>  Some examples of derived opcodes:
>
>  SWAP = 0 FLIPN
>  DUP = 0 GETN
>  POPD = 0 SETN
>  POP = SWAP POPD
>  QUOTE = [] CONS
>  APPLY = TRUE SWAP APPLYIF
>  CONS = SWAP QUOTE PAPPLY
>  UNCONS = APPLY SWAP
>  DIP = SWAP QUOTE COMPOSE APPLY
>  */
>
>  Christopher Diggins
>  http://www.cat-language.com
>  

Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by William Tanksley, Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You missed the ADD instruction.

Seriously, though, I do suspect that Chuck Moore has it right on this
one... Use instruction slots in a larger word size, and allow some of
the slots to encode alternate instructions. In the SeaForth, you have
an 18-bit word size with 3 5-bit instructions plus a trailing 3-bit
instruction... The trailing instruction is very useful for the uLOOP
instruction, which can only go there and which performs a 'decrement
and loop if not zero' without a destination (which means it'll repeat
the preceding words in the instruction).You might want a trailing
_and_ a preceding slot; the preceding slot could be for literals and
targetted jumps (so that the instruction and the literal fit in the
same instruction slot).

And so on.

-Wm

On Jan 11, 2008 12:45 AM, Daniel Ehrenberg <microdan@...> wrote:

> Don't you think basic arithmetical operators would be useful, if this
> might eventually be actual and not theoretical hardware? If you only
> have and, shift and xor, maybe it's *possible* to implement arithmetic
> (I personally don't know how) but it wouldn't be as efficient. There's
> no real point in restricting yourself to 16 operations, especially
> when some of these operations will encode a literal and some of them
> won't.
>
> Dan
>
>
> On Jan 10, 2008 2:12 PM, Christopher Diggins <cdiggins@...> wrote:
> >
> >
> >
> >
> >
> >
> > Hi all,
> >
> >  Here is my most recent 16 instruction set for concatenative hardware.
> >  Any comments would be greatly appreciated.
> >
> >  /*
> >  * Released into the public domain by Christopher Diggins
> >  * Contact me if you have special licensing requirements
> >  * http://www.cdiggins.com
> >  */
> >
> >  enum
> >  {
> >  /* Push a literal value from the instruction stream */
> >  LIT = 0, /* -> a */
> >
> >  /* Add top two values on the stack */
> >  ADD = 1, /* a b -> (a + b) */
> >
> >  /* Shift second value on the stack left or right */
> >  SHIFT = 2, /* a b -> b < 32 ? (a >> b) : (a << (b - 32)) */
> >
> >  /* Perform exclusive or of top two values */
> >  XOR = 3, /* a b -> (a ^ b) */
> >
> >  /* Perfrom bitwise and of top two values */
> >  AND = 4, /* a b -> (a & b) */
> >
> >  /* Dynamic function/list construction */
> >  /* Note: lists and functions are identical from a programmer's perspective,
> >  a list is the result of a function that consumes nothing. Think of it
> >  as a new stack yielded by executing the function, except that it is
> >  lazy (evaluated at last minute) */
> >  PAPPLY = 5, /* a [X] -> [a X] */
> >  COMPOSE = 6, /* [X] [Y] -> [X Y] */
> >
> >  /* Branching and calling */
> >  APPLYIF = 7, /* bool [X] -> X or NOOP */
> >
> >  /* Reference creation and manipulation, used for playing with heap */
> >  MAKEREF = 8, /* a -> &a */
> >  GETREF = 9, /* &a -> a */
> >  SETREF = 10, /* &a b -> &(*a = b) */
> >
> >  /* Stack manipulation operations */
> >  FLIP = 11, /* a X b n:int -> b X a ; sizeof(X) == n */
> >  GETN = 12, /* a X n:int -> (a X, a) ; sizeof(X) == n */
> >  SETN = 13, /* (a X n:int, b) -> b X ; sizeof(X) == n */
> >
> >  /* Applies a function to a list */
> >  APPPLYTO = 14, /* [->X] [X->Y] => [Y] */
> >
> >  /* Make a suggestion ( I was thinking of COUNT, but I
> >  am not so sure ) */
> >  UNUSED = 15
> >  }
> >  opcodes;
> >
> >  /*
> >  Some examples of derived opcodes:
> >
> >  SWAP = 0 FLIPN
> >  DUP = 0 GETN
> >  POPD = 0 SETN
> >  POP = SWAP POPD
> >  QUOTE = [] CONS
> >  APPLY = TRUE SWAP APPLYIF
> >  CONS = SWAP QUOTE PAPPLY
> >  UNCONS = APPLY SWAP
> >  DIP = SWAP QUOTE COMPOSE APPLY
> >  */
> >
> >  Christopher Diggins
> >  http://www.cat-language.com
> >
>
>
>
> Yahoo! Groups Links
>
>
>
>

Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by Christopher Diggins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jan 11, 2008 9:57 AM, William Tanksley, Jr <wtanksleyjr@...> wrote:
>
> Seriously, though, I do suspect that Chuck Moore has it right on this
> one...

Why?

> Use instruction slots in a larger word size, and allow some of
> the slots to encode alternate instructions. In the SeaForth, you have
> an 18-bit word size with 3 5-bit instructions plus a trailing 3-bit
> instruction... The trailing instruction is very useful for the uLOOP
> instruction, which can only go there and which performs a 'decrement
> and loop if not zero' without a destination (which means it'll repeat
> the preceding words in the instruction).

Sure but that is just one instruction. Overall, having 16
instructions, means a much smaller chip with very very small power
overhead. I'm not an expert in the domain of course, but I would like
to know more about the trade-offs.

> You might want a trailing
> _and_ a preceding slot; the preceding slot could be for literals and
> targetted jumps (so that the instruction and the literal fit in the
> same instruction slot).

I don't understand what this means, and why.

> And so on.

Sorry that I don't fully understand.

- Christopher

Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by Christopher Diggins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jan 11, 2008 12:45 AM, Daniel Ehrenberg <microdan@...> wrote:
>
> Don't you think basic arithmetical operators would be useful, if this
> might eventually be actual and not theoretical hardware?  If you only
> have and, shift and xor, maybe it's *possible* to implement arithmetic
> (I personally don't know how) but it wouldn't be as efficient.

You overlooked add. With that we can perform all arithmetical
operations without too much problems. Of course, this is not
appropriate for a PC CPU, but rather a very small embedded device.

> There's
> no real point in restricting yourself to 16 operations, especially
> when some of these operations will encode a literal and some of them
> won't.

I am looking at here the possibilities of implementing minimal
hardware that is cheap, small, and has very little energy consumption.

However, to be completely honest, the more I experiment with the
instruction set, the more I think that I would prefer to work with a
255-instruction set (ala JVML).

- Christopher

Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by William Tanksley, Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Christopher Diggins <cdiggins@...> wrote:
> William Tanksley, Jr <wtanksleyjr@...> wrote:
> > Seriously, though, I do suspect that Chuck Moore has it right on this
> > one...

> Why?

> > Use instruction slots in a larger word size, and allow some of
> > the slots to encode alternate instructions.

This is what Chuck does. The reason I think he's got it right is that
16 instructions just isn't enough for a reasonable programming
platform, especially if you include specialized instructions (such as
your list manipulators). 32 instructions is PLENTY -- you'll wind up
with spares. And the specialized slots allow you to include
instructions whose only purpose is making decoding easy.

Right now you've got 4-bit instructions; I assume a 32-bit word size.
That means you load 8 instructions with every fetch. That's a _lot_ of
instructions that can run without any need for memory access... If you
change to 5-bit instructions you can fit 6 instructions, plus 2 bits
for something else. 6 instructions is still an awful lot, and now you
can include more useful instructions which will make your operations
much more powerful. Use those two leftover bits to allow 4 different
ways of interpreting the loaded instruction word: 00=literal (just
push it on the stack and load the next); 01=literal jump;
10=microloop; 11=literal call. (Or whatever encoding and instructions
you want.)

Microloop is amazingly powerful, and changes the way your instruction
set works. With it, you should consider adding instructions that use
iteration... For example, Chuck didn't have room on his chip for a
multiplier, but he did include a 'multiply step' instruction, which
with the uLOOP capability makes multiplication very simple. I don't
know what your chip might need... Perhaps a MergeSortStep instruction?

> > In the SeaForth, you have
> > an 18-bit word size with 3 5-bit instructions plus a trailing 3-bit
> > instruction... The trailing instruction is very useful for the uLOOP
> > instruction, which can only go there and which performs a 'decrement
> > and loop if not zero' without a destination (which means it'll repeat
> > the preceding words in the instruction).

> Sure but that is just one instruction. Overall, having 16
> instructions, means a much smaller chip with very very small power
> overhead. I'm not an expert in the domain of course, but I would like
> to know more about the trade-offs.

4 bits versus 5 bits is just what it looks like -- a 20% space
savings; that adds up FAST. The problem is that if the 16 instructions
you choose exclude commonly used ones, your programmers will not only
be annoyed, their programs will wind up taking MORE space, since they
have to use 2 or 3 opcodes to implement what could have been done with
1 5-bit opcode.

You're in an experimental design space, attempting to produce a
high-level microarchitecture (i.e. with instructions for a high-level
task, list manipulation). I think you'd be better served by
underconstraining your problem space: loosen up the constraint on the
"smallest possible instruction set" in order to give yourself elbow
room for "highest level instruction set". Don't give up on "smallest
possible"; just produce a working draft that is a bit big, and then
see if you can make it smaller later.

> - Christopher

\-Wm

Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by William Tanksley, Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Christopher Diggins <cdiggins@...> wrote:
> Daniel Ehrenberg <microdan@...> wrote:
> > There's
> > no real point in restricting yourself to 16 operations, especially
> > when some of these operations will encode a literal and some of them
> > won't.

> I am looking at here the possibilities of implementing minimal
> hardware that is cheap, small, and has very little energy consumption.

A fascinating goal, and one which to my knowledge has not been
attempted together with a high-level instruction set.

> However, to be completely honest, the more I experiment with the
> instruction set, the more I think that I would prefer to work with a
> 255-instruction set (ala JVML).

Completely incompatible with the goal of minimal hardware -- and do
you actually have anything close to 256 opcodes? Can you imagine
building all the hardware to support each one and muxing/demuxing them
together? Wouldn't retreating to 5-bit instructions make more sense
initially?

> - Christopher

-Wm

Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by Christopher Diggins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jan 11, 2008 11:35 AM, William Tanksley, Jr <wtanksleyjr@...> wrote:
> A fascinating goal, and one which to my knowledge has not been
> attempted together with a high-level instruction set.

I think the distinction between "high-level" and "low-level" is
artificial, and rooted to firmly on the naivete of our predecessors.
If we ignore it, we can come up with something new and interesting.

> > However, to be completely honest, the more I experiment with the
> > instruction set, the more I think that I would prefer to work with a
> > 255-instruction set (ala JVML).
>
> Completely incompatible with the goal of minimal hardware

Not at all, it depends on what you do with these instructions. I
should clarify that I really mean "1-byte" operators. The core
instructions would be few, and many of the bits would be flags.
Consider:

dup
pop
swap
...

and then with a bit flag to represent "dip"

dupd == [dup] dip
popd == [dup] dip
swap == [swap] dip
...

and another bit flag to represent "dup"

ddup = dup dup
dpop = dup pop
dswap = dup swap
...

and another bit flag to represent "swap"

swdup = swap dup
swpop = swap pop
...

Hopefully this illustrates what I have in mind.

> -- and do
> you actually have anything close to 256 opcodes?

It's quite easy to come up with a whole bunch of opcodes.

> Can you imagine
> building all the hardware to support each one and muxing/demuxing them
> together?

Well yes. I don't think it neccessarily has to be complex.

> Wouldn't retreating to 5-bit instructions make more sense
> initially?

The simplicity of the design has less to do with the number of
instructions, than it does with the organization and kind of
instructions. 4-bit instructions is an interesting challenge, because
it is so darn small. 5-bit instructions has already been done and I
believe would be hard to port to other architectures efficiently.
8-bit instructions is easy to port and emulate. I consider emulation
very important, because a practical virtual machine would be run on
many different kinds of chips and software.

- Christopher

Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by Christopher Diggins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> The problem is that if the 16 instructions
> you choose exclude commonly used ones, your programmers will not only
> be annoyed,

I'm not really designing the thing to be implemented by hand, I would
expect the programmer to use a compiler.

> their programs will wind up taking MORE space, since they
> have to use 2 or 3 opcodes to implement what could have been done with
> 1 5-bit opcode.

That line of reasoning isn't accurate. Just because you have fewer
opcodes doesn't mean you will neccessarily take more space. It really
has to do with how well balanced the instruction set is for the
problem domain. One isn't a clear win over the other. Of course, what
I propose is probably naive enough (because I haven't performed any
practical experimentation) so as not to be very space efficient at
this point.

- Christopher

Parent Message unknown Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by Adam Marquis-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi there,

First, I'm glad to read that kind of post, very suited for FP newbies like me.

    I wonder if the Concatenative group has knowledge of the Forth
effort on the instruction set design front, as published on
ultratechnology.com and discussed in comp.lang.forth . Most notably
the AHA / F21 instruction set and the c18 one. The C18 spec was
adapted for the new SEAForth processors from Intellasys, Mr Moore's
actual(?) employer.

    These imperative-programming guidelines (minimum number of SWAPs
is zero, etc) maybe are not as well suited to one-stack functionnal
programming, I guess you'll be the judges, as I'm barely starting to
grasps basic FP concepts: still wrapping my head around Lisp,
really....

    Nevertheless, I think these FORTH opcodes sets embody a useful
wisdom, if one can see through the veil, to weight the pros and cons
of one's own Most Wonderful Instruction Set, especially if you never
came across them: they're even simpler IMHO (defined in hardware more
clearly? sorry for the poor poo phrasing) than what Christopher
proposed previously, albeit ill-suited for the FP task at hand but
their special instructions sure could be eye-openers.

http://www.ultratechnology.com/f21cpu.html#cpu
http://newsgroups.derkeiler.com/Archive/Comp/comp.lang.forth/2006-06/msg00526.html

Re: Concatenative Hardware Redux (Another 16 Instruction Set)

by William Tanksley, Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Christopher Diggins <cdiggins@...> wrote:
> William Tanksley, Jr <wtanksleyjr@...> wrote:
> > A fascinating goal, and one which to my knowledge has not been
> > attempted together with a high-level instruction set.

> I think the distinction between "high-level" and "low-level" is
> artificial, and rooted to firmly on the naivete of our predecessors.
> If we ignore it, we can come up with something new and interesting.

Well then, "special purpose" versus "general purpose". Either way, IMO
interesting.

> > > However, to be completely honest, the more I experiment with the
> > > instruction set, the more I think that I would prefer to work with a
> > > 255-instruction set (ala JVML).

> > Completely incompatible with the goal of minimal hardware

> Not at all, it depends on what you do with these instructions. I
> should clarify that I really mean "1-byte" operators. The core
> instructions would be few, and many of the bits would be flags.

Ah, this is how I handled the 8-bit instructions I was required to use
during my processor design class (UCSD). I encoded only a few fields
in the instruction word, and the rest I ran as raw input directly to
the needed components of the processor. Saved a few transistors, at
the cost of vastly larger code size (but I was competing against
students designing accumulator machines, so they had the same code
size problems).

> Consider:
> and then with a bit flag to represent "dip"
> and another bit flag to represent "dup"
> and another bit flag to represent "swap"
> Hopefully this illustrates what I have in mind.

I think it does. I'm sure you haven't settled on any meanings for bits
yet, and those are just the first examples that pop to mind. (Note
that deep stack effects are a _very_ bad idea, so being able to easily
add dups will make your instructions VERY slow. In fact, these are
_very_ bad choices, because it's not clear how those appendage
instructions will be ordered... If I turn on both the dip AND swap
bit, which one happens first?) But I'm sure you'll put the effort
needed into designing this when and if you decide to.

> > -- and do
> > you actually have anything close to 256 opcodes?
> It's quite easy to come up with a whole bunch of opcodes.

No doubt. Useful and coherent ones? That's another matter.

> > Wouldn't retreating to 5-bit instructions make more sense
> > initially?

> The simplicity of the design has less to do with the number of
> instructions, than it does with the organization and kind of
> instructions. 4-bit instructions is an interesting challenge, because
> it is so darn small. 5-bit instructions has already been done and I

Again, I don't recommend taking on 2 "interesting challenges" in the
same problem. Small instruction set is cool; high-level instructions
is cool. Both together? Too many constraints. Solve one, THEN solve
the other.

> believe would be hard to port to other architectures efficiently.

I don't understand the "hard to port" objection at all. Clarify? How
can a microarchitecture need to be ported to another architecture?

> 8-bit instructions is easy to port and emulate. I consider emulation
> very important, because a practical virtual machine would be run on
> many different kinds of chips and software.

Well, that's certainly an important design constraint. Software can
handle muxing a LOT better than hardware can (of course, software has
a MUCH harder time of handling predecoded instructions). 8-bit VM
instructions make a lot more sense than 8-bit hardware instructions.

> - Christopher

-Wm
LightInTheBox - Buy quality products at wholesale price