fundamental type system problems + possible solutions

View: New views
8 Messages — Rating Filter:   Alert me  

fundamental type system problems + possible solutions

by John Nowak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It seems there's a fundamental problem with typing concatenative  
languages: Most type systems are not compositional which destroys the  
simple algebra of concatenative programs. Take these equivalencies for  
example:

    [$A] i       ==  $A       -- application
    [$A] [$B] o  ==  [$A $B]  -- composition
    $a [$B] pa   ==  [$a $B]  -- partial application
    [$A] m       ==  [$A] $A  -- self-application

Unfortunately, all of these rules (except perhaps for partial  
application) fall apart once types are involved. This is of practical  
concern as it makes cut-and-paste refactoring impossible and will  
cause problems with many macro transformations.

What is needed is a type system that is compositional. One system that  
meets this requirement is System E developed by Carlier and Wells [1].  
While this seems like an appealing solution at first, the types  
produced by System E are so complex that they're not really human-
decipherable. Even if a solution to making error reports  
understandable could be found via some form of type error slicing [2],  
there would still be no way to make types useful as documentation.

As I see it, there are three main benefits of static type systems:  
Automatic and correct documentation, increased safety, and increased  
efficiency. Of these three, I consider documentation to be the most  
important. After all, there are language implementations without  
static type systems that perform quite well such as SBCL and LuaJIT.  
As for safety, dynamic languages are used to build highly-reliable  
systems (see Erlang); ultimately, the design of a program and ability  
of the implementors contributes much more to software reliability than  
the presence of a type system (compare Haskell/Darcs to C/Perl/Git).

The need for documentation, however, you can't escape. This is  
especially so in stack-based languages where stack effect comments are  
necessary to make sense out of things. While no type system will  
replace the need for such comments entirely, as it is often necessary  
to state not just the type of a value but its purpose, a type system  
can at least ensure the correct number of arguments will always be  
passed and that type constraints will be respected.

I would rather have no type system than one which makes simple  
transformations, and hence macros and much refactoring, impossible.  
Static type systems seem doomed to being either non-compositional or  
not suited to producing human-readable documentation.  Therefore, as I  
see it, there are two possibilities going forward.

One is to develop a soft type system that hopefully will be able to  
report most errors without rejecting any program. If a language is  
designed with a soft type system in mind, the quality of error  
reporting can be quite good. For example, dynamic checks can be  
inserted to ensure lists are used homogeneously and type declarations  
can be required for data types to avoid the need for equi-recursive  
types.

The other approach would be some way of inferring the equivalences  
mentioned at the beginning of this email. I guess this would take the  
form of some kind of abstract interpretation. The goal here would not  
be to catch all type errors (although some would be catchable), but  
rather to provide a detailed description of what a function does  
beyond what an HM-based type system would be capable of yielding. It  
seems the primary issue here would be giving equivalencies for  
functions involving recursive data types. For example, if we define  
some 'each' function that simply places an element from a list on the  
stack and then calls some function repeatedly until the list is empty,  
how would we express this?

Anyway, these are just my current thoughts. As for Fifth, I've decided  
to delay inclusion of the type system until I can find a solution that  
won't cause problems with macros or inhibit cut-and-paste refactoring.  
(I'll credit Slava Pestov for reminding me of how critical this is.)  
Any thoughts would be appreciated, especially regarding the second  
approach described above.

- John

[1] http://www.church-project.org/modular/compositional/system-E/
[2] http://www.macs.hw.ac.uk/ultra/compositional-analysis/type-error-slicing/slicing.cgi


Re: fundamental type system problems + possible solutions

by William Tanksley, Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

John Nowak <john@...> wrote:
> It seems there's a fundamental problem with typing concatenative
>  languages: Most type systems are not compositional which destroys the
>  simple algebra of concatenative programs.

The literature includes some uses of the word "compositional" for type
systems, but is there a specific meaning you have in mind here? I
think you're talking about a type system which completely preserves
the associative property of concatenative languages (the associative
property guarantees both arbitrary factoring and arbitrary order of
evaluation).

That's a goal I share... Although it's also a goal I want to be able
to work around. There are times when a type system should be able to
force a specific order of evaluation, primarily when the system
deduces the presence of side effects.

Let me add one more thing before I go on: this problem is specific to
type inference, not to type checking in general. If you always
annotate inputs, outputs, and executions, there's no problem.

I need to read up on this; thank you for the references.

>  As I see it, there are three main benefits of static type systems:
>  Automatic and correct documentation, increased safety, and increased
>  efficiency.

I suppose the availability of statically known information to the
programmer counts as "documentation", but sometimes it's more than
that. Consider Ada's type attributes, as in TYPENAME'MAX or
TYPENAME'SIZE. In theory, we could do much better.

I would consider a static type system to be a huge "win" if it helped
programmers. I'm not against static type systems for their own sake,
but I'm strongly for them when they help.

The effect system you proposed would have some interesting ways of helping.

Compare Spark Ada -- which, by the way, might be a useful role model
for a very powerful type system that doesn't actually hurt code that
it can't check (it's entirely optional, contained entirely within
comments of perfectly standard Ada code).
http://en.wikipedia.org/wiki/SPARK_programming_language is perhaps a
good place to start. (I know there are more formal systems out there,
but this one's especially interesting because it's purely optional.)

>  Of these three, I consider documentation to be the most
>  important. After all, there are language implementations without
>  static type systems that perform quite well such as SBCL and LuaJIT.

A good point; JITs tend to have a lot of useful information available
to them. On the other hand, it's easy to design a language that is
very hard to JIT.

>  ultimately, the design of a program and ability
>  of the implementors contributes much more to software reliability than
>  the presence of a type system (compare Haskell/Darcs to C/Perl/Git).

Ability, dedication, and above all else _testing_. Darcs isn't buggy
because its maintainer is unskilled; it's buggy (to the extent that it
IS) because he can't consider all the possible cases, and there aren't
enough people hammering on it (both in coders and in users) to find
them reliably. (I know Darcs is getting a LOT better, by the way; it
was my favorite, although I use Mercurial now.)

Support for testing is important; it would be nice to have support for
it baked into the compiler, and to have support not only for specific
test cases, but also for universal claims (as in Design By Contract,
Haskell's QuickCheck, and formal static annotation-augmented analysis
like SPARK).

>  I would rather have no type system than one which makes simple
>  transformations, and hence macros and much refactoring, impossible.
>  Static type systems seem doomed to being either non-compositional or
>  not suited to producing human-readable documentation.  Therefore, as I
>  see it, there are two possibilities going forward.

Requiring limited annotation could be useful, and has some nice
effects in terms of safety as well -- it's easy to write a function
for which the compiler infers a correct type that is not similar to
the type you'd desired (thus deferring the error message until later,
when it's probably less useful).

>  The other approach would be some way of inferring the equivalences
>  mentioned at the beginning of this email. I guess this would take the
>  form of some kind of abstract interpretation. The goal here would not
>  be to catch all type errors (although some would be catchable), but
>  rather to provide a detailed description of what a function does
>  beyond what an HM-based type system would be capable of yielding. It
>  seems the primary issue here would be giving equivalencies for
>  functions involving recursive data types. For example, if we define
>  some 'each' function that simply places an element from a list on the
>  stack and then calls some function repeatedly until the list is empty,
>  how would we express this?

The uncomputable sky is the limit -- we need a type theory in order to
decide anything here.

>  - John

-Wm

Re: fundamental type system problems + possible solutions

by John Nowak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Apr 29, 2008, at 1:16 PM, William Tanksley, Jr wrote:

> John Nowak <john@...> wrote:
>> It seems there's a fundamental problem with typing concatenative
>> languages: Most type systems are not compositional which destroys the
>> simple algebra of concatenative programs.
>
> The literature includes some uses of the word "compositional" for type
> systems, but is there a specific meaning you have in mind here? I
> think you're talking about a type system which completely preserves
> the associative property of concatenative languages (the associative
> property guarantees both arbitrary factoring and arbitrary order of
> evaluation).

Correct. The expression '[a b] [c] compose' should always give the  
same type as '[a] [b c] compose'. Additionally, '[foo] i' should give  
the same type as 'foo'.

> That's a goal I share... Although it's also a goal I want to be able
> to work around. There are times when a type system should be able to
> force a specific order of evaluation, primarily when the system
> deduces the presence of side effects.

Fifth is a strict language in which evaluation is always left to  
right, so I don't think this would be an issue.

> Let me add one more thing before I go on: this problem is specific to
> type inference, not to type checking in general. If you always
> annotate inputs, outputs, and executions, there's no problem.

This is true. The problem comes in when dealing with macros. For  
example, a macro that translates lambda expressions to stack-based  
code needs to splice values into (possibly nested) quotations.  
Unfortunately, this translation might result in very confusing type  
errors, and there would be no way to give types to the generated  
quotations since the code is the result of a macro.

Ignoring macros for a moment, I have no issue with requiring  
annotations. I'm particularly partial to Daan Leijen's HML system (http://research.microsoft.com/users/daan/pubs.html 
). It gives more precise types than System F and offers full inference  
for impredicative instantiation. Type annotations are only required  
when a function parameter has a polymorphic type.

Maybe the solution is to just forget about macros, build in lambda  
expressions (which will give better error reporting), build in pattern  
matching, and just get on with it. Bah.

> Compare Spark Ada -- which, by the way, might be a useful role model
> for a very powerful type system that doesn't actually hurt code that
> it can't check (it's entirely optional, contained entirely within
> comments of perfectly standard Ada code).

Aye, Spark is great. It's unfortunate that it isn't free.

> Ability, dedication, and above all else _testing_. Darcs isn't buggy
> because its maintainer is unskilled; it's buggy (to the extent that it
> IS) because he can't consider all the possible cases, and there aren't
> enough people hammering on it (both in coders and in users) to find
> them reliably. (I know Darcs is getting a LOT better, by the way; it
> was my favorite, although I use Mercurial now.)

I love Darcs myself, and I continue to use it despite the issues. To  
be clear, in no way meant to say the people behind Darcs are  
unskilled. Rather, my point is that there are problems in the big-
picture design of the program itself. Despite Haskell's rather fancy  
types, and the team's use of GADTs to enforce correctness, I still  
wouldn't trust it for anything involving more than a few people.  
Luckily, it does seem to be getting better. Darcs 2 is quite zippy,  
relatively speaking.

To be honest, I'm just experiencing some cognitive dissonance. My  
brain is trying frantically to rationalize avoiding the type system  
for awhile.

> Support for testing is important; it would be nice to have support for
> it baked into the compiler, and to have support not only for specific
> test cases, but also for universal claims (as in Design By Contract,
> Haskell's QuickCheck, and formal static annotation-augmented analysis
> like SPARK).

Fifth will offer contracts. This was a bit of a trick to get to  
working efficiently given that I wanted to be able to ensure removing  
contracts would have no effect on the program. In particular, it  
proved necessary for the programmer to state how many items on the  
stack a contract will need to access so that a portion of the stack  
can be (shallow) copied and restored before each assertion. The type  
system would ensure no more than the number of elements specified are  
accessible to the assertion. This would all be much easier in Joy  
where the stack is persistent, but this is out of the question for  
efficiency reasons.

Thanks for the input.

- John

Re: fundamental type system problems + possible solutions

by William Tanksley, Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

John Nowak <john@...> wrote:
>  William Tanksley, Jr wrote:
>  > John Nowak <john@...> wrote:
>  > That's a goal I share... Although it's also a goal I want to be able
>  > to work around. There are times when a type system should be able to
>  > force a specific order of evaluation, primarily when the system
>  > deduces the presence of side effects.

>  Fifth is a strict language in which evaluation is always left to
>  right, so I don't think this would be an issue.

Are you saying that it's impossible to write an optimizing code
generator for Fifth? :-)

Seriously, though, I don't see why you should worry about 'effects
inference' if you can't reorder anything. The only reason I know of to
disallow effects is if evaluation order is flexible.

>  > Let me add one more thing before I go on: this problem is specific to
>  > type inference, not to type checking in general. If you always
>  > annotate inputs, outputs, and executions, there's no problem.

>  This is true. The problem comes in when dealing with macros. For
>  example, a macro that translates lambda expressions to stack-based
>  code needs to splice values into (possibly nested) quotations.
>  Unfortunately, this translation might result in very confusing type
>  errors, and there would be no way to give types to the generated
>  quotations since the code is the result of a macro.

Perhaps some more thought could be put into integrating the macro
system and the type system. It seems to me that there are a few ways
it could be done.

Every parameter of the macro has a type; if inferring the type is too
hard (and it probably is) you can require type signatures as part of
the macro definition. The result of the macro is always a stretch of
code, which also always has a type, although I expect it to not be
possible to manually sign that type in general (although there's a
proper subset of macros for which the type is manually signable).

Obviously, a typechecked macro system would exclude some obvious
macros... But can we get things done with a subset of all possible
macros? I think so...

>  > Compare Spark Ada -- which, by the way, might be a useful role model
>  > for a very powerful type system that doesn't actually hurt code that
>  > it can't check (it's entirely optional, contained entirely within
>  > comments of perfectly standard Ada code).
>  Aye, Spark is great. It's unfortunate that it isn't free.

Yes, but there's plenty of research on similar concepts.

>  unskilled. Rather, my point is that there are problems in the big-
>  picture design of the program itself. Despite Haskell's rather fancy
>  types, and the team's use of GADTs to enforce correctness, I still
>  wouldn't trust it for anything involving more than a few people.
>  Luckily, it does seem to be getting better. Darcs 2 is quite zippy,
>  relatively speaking.

Yup. The crucial lesson is that typechecking can't specify
correctness, ever. Correctness is uncomputable, and neither types
alone nor tests alone nor contracts alone are sufficient to specify
it.

>  To be honest, I'm just experiencing some cognitive dissonance. My
>  brain is trying frantically to rationalize avoiding the type system
>  for awhile.

Well, if you CAN, go for it. I enjoy having the freedom to put off a
painful part of a project sometimes.

>  - John

-Wm

Re: fundamental type system problems + possible solutions

by John Nowak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Apr 30, 2008, at 10:52 AM, William Tanksley, Jr wrote:

>> Fifth is a strict language in which evaluation is always left to
>> right, so I don't think this would be an issue.
>
> Are you saying that it's impossible to write an optimizing code
> generator for Fifth? :-)
> Seriously, though, I don't see why you should worry about 'effects
> inference' if you can't reorder anything. The only reason I know of to
> disallow effects is if evaluation order is flexible.

Ah, I misunderstood. I thought you were thinking that Fifth was lazy  
and somehow needed the effect system to avoid stomping all over itself  
when IO was involved. No, of course the effect system is useful for  
optimization purposes. It's also useful for things like ensuring the  
function passed to a parallel map function has no effects.

> Perhaps some more thought could be put into integrating the macro
> system and the type system. It seems to me that there are a few ways
> it could be done.
>
> Every parameter of the macro has a type; if inferring the type is too
> hard (and it probably is) you can require type signatures as part of
> the macro definition.

The problem is not typing the arguments to the macro; it's dealing  
with the situation where the expansion of the macro is not inferable  
and there's no way to provide annotations to quotations within the  
expansion because the expansion is obviously not in the source code.  
There are also issues with macros becoming much more tedious due to  
the need to shuffle data in and out of some AST sum type in the macro  
procedure itself. Ideally, you'd want implicit coersion to some 'any'  
type as necessary to avoid the noise.

> Obviously, a typechecked macro system would exclude some obvious
> macros... But can we get things done with a subset of all possible
> macros? I think so...

Maybe. It's not just that though; if the facility becomes too much of  
a pain to use for useful things, it might as well not be there at all.

Concatenative languages have a huge benefit in terms of macros because  
there are no hygiene issues to worry about. Ignoring types, macros are  
beyond trivial. Ideally, it would be nice to keep this property.

A soft type system makes this easy. Not being able to prove that the  
macro evaluation itself will not have a type error is of little  
concern to me provided that the expansion can be checked in the vast  
majority of cases. A dynamic failure during a compile time expansion  
isn't a big deal.

Right now, I'm leaning towards a soft intersection type system that  
will be robust with respect to transformations and infer most any  
reasonable program. Even if the type system never gets in your way,  
there are benefits to soft types. For example, being able to run a  
program to discover what's wrong with it when the type errors are  
undecipherable is a very nice feature, especially for programmers new  
to the language. I'll probably write up something about my approach  
soon once I've filled in the details.

- John

Re: fundamental type system problems + possible solutions

by William Tanksley, Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

John Nowak <john@...> wrote:
>  William Tanksley, Jr wrote:
>  > Perhaps some more thought could be put into integrating the macro
>  > system and the type system. It seems to me that there are a few ways
>  > it could be done.

>  > Every parameter of the macro has a type; if inferring the type is too
>  > hard (and it probably is) you can require type signatures as part of
>  > the macro definition.

>  The problem is not typing the arguments to the macro; it's dealing
>  with the situation where the expansion of the macro is not inferable
>  and there's no way to provide annotations to quotations within the
>  expansion because the expansion is obviously not in the source code.

I can read this okay up until the phrase "obviously not in the source
code." Then I get completely lost. I'm going to reduce the argument to
a logical form as I understand it...

"It's impossible to X when condition Y holds."
   X = "provide annotations to quotations within the expansion"
   Y = "the expansion is obviously not in the source code"

I don't see how this could be _possibly_ true. Why should it ever be
impossible to annotate a quotation generated by a macro? Why is
"obviousness" part of the criteria for impossibility? How can code
generated by a macro not be part of the source code (unless you mean
not part of the original source code pre-generation, but again, I
don't see how it makes a difference whether the code was typed out by
hand or generated by a macro).

>  There are also issues with macros becoming much more tedious due to
>  the need to shuffle data in and out of some AST sum type in the macro
>  procedure itself.

I think I see your point here... I don't know what AST means (abstract
syntax tree can't be right, can it?), but I can imagine a macro being
able to dynamically generate code AND also dynamically generate the
code's type signature. Yes, that would be both powerful and very
annoying.

>  > Obviously, a typechecked macro system would exclude some obvious
>  > macros... But can we get things done with a subset of all possible
>  > macros? I think so...

>  Maybe. It's not just that though; if the facility becomes too much of
>  a pain to use for useful things, it might as well not be there at all.

That's what I mean -- break "macros" down into different classes and
subclasses, and only implement the classes that can currently be
reasonably accomplished, possibly using distinct syntax (or better,
distinct defining words) for each type.

The big problem with doing things this way is that someday you'll find
a sensible way to accomplish something new, and you'll have to add on
a new subclass of macros to express that. Your language will want to
change, but legacy will hold it back.

>  Concatenative languages have a huge benefit in terms of macros because
>  there are no hygiene issues to worry about. Ignoring types, macros are
>  beyond trivial. Ideally, it would be nice to keep this property.

Agreed. This is a hard problem with types in general.

>  A soft type system makes this easy.

Check out "hybrid typing",
http://www.cs.ucsc.edu/~cormac/papers/popl06-hybrid.pdf; I don't know
if it's any good, but it claims to be :-).

> Not being able to prove that the
>  macro evaluation itself will not have a type error is of little
>  concern to me provided that the expansion can be checked in the vast
>  majority of cases. A dynamic failure during a compile time expansion
>  isn't a big deal.

But a dynamic failure "during a compile time expansion" is actually a
static failure during the definition of the word using the macro, and
you've got all the same problems you wanted to solve as far as
printing out a readable error message that the person trying to use
the macro will be able to figure out.

>  Right now, I'm leaning towards a soft intersection type system that
>  will be robust with respect to transformations and infer most any
>  reasonable program. Even if the type system never gets in your way,
>  there are benefits to soft types. For example, being able to run a
>  program to discover what's wrong with it when the type errors are
>  undecipherable is a very nice feature, especially for programmers new
>  to the language. I'll probably write up something about my approach
>  soon once I've filled in the details.

I'm interested and curious. I think I get your point... Perhaps such a
type system could interact with the contract and test case
specifications to see what possibilities are ruled out or explicitly
required (respectively); and a type error could imply that you need to
correct one of your specifications or add a new specification, and
(even better) the error could hint what that specification would look
like.

>  - John

-Wm

Re: fundamental type system problems + possible solutions

by John Nowak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On May 1, 2008, at 11:30 AM, William Tanksley, Jr wrote:

>> The problem is not typing the arguments to the macro; it's dealing
>> with the situation where the expansion of the macro is not inferable
>> and there's no way to provide annotations to quotations within the
>> expansion because the expansion is obviously not in the source code.
>
> I can read this okay up until the phrase "obviously not in the source
> code." ... (unless you mean
> not part of the original source code pre-generation, but again, I
> don't see how it makes a difference whether the code was typed out by
> hand or generated by a macro).

That is what I mean. Say you give a macro a quotation. That quotation  
type checks. Now say the macro tears the quotation apart, and one of  
the individual parts of the quotation doesn't check (which is very  
possible). How would you annotate the fragment of the quotation that  
failed to check?

> But a dynamic failure "during a compile time expansion" is actually a
> static failure during the definition of the word using the macro, and
> you've got all the same problems you wanted to solve as far as
> printing out a readable error message that the person trying to use
> the macro will be able to figure out.

Good point. Debugging macros is already little fun to begin with  
though. My point was just that you won't be getting an error at runtime.

In any case, I have another approach for a type system I want to try  
that more heavily depends on the algebraic properties of the language.  
Initial results are encouraging, but I'll need another couple of  
months to implement it as it's rather complex and I'm guessing my way  
through it.

- John

Re: fundamental type system problems + possible solutions

by William Tanksley, Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Before I start: http://citeseer.ist.psu.edu/dornan98typesecure.html
seems to possibly have something to do with this (maybe?). It's about
using dynamic types in a static system (which is nice for a soft type
system), and also about a static typechecked metaprogramming system.

John Nowak <john@...> wrote:
>  That is what I mean. Say you give a macro a quotation. That quotation
>  type checks. Now say the macro tears the quotation apart, and one of
>  the individual parts of the quotation doesn't check (which is very
>  possible). How would you annotate the fragment of the quotation that
>  failed to check?

You're right -- I'm so used to reasoning about my experimental 01
language I'd forgotten that in the real world quotations can be torn
apart. I don't have a good answer to that. I was thinking of providing
tools that would cut quotations on boundaries of known types, but that
won't work if you're cutting the quotations on the results of string
searches (which is what a lambda-substitution macro has to do).

It would make sense to make quotations impossible to cut, but possible
to disassemble -- when you cut a quotation on a symbol, you get back
the quotation before the symbol, the quotation after the symbol, and a
program which when concatenated after those two correctly concatenates
them together. Simplest case:

[yadda x yadda] cut(x) ==
[yadda] [yadda] [concat]

All of the words that cut, concat, or insert quotations would, of
course, recompute and alter the quotations type signature.

I haven't figured out a heuristic to guess where a type error message
should direct the programmer's attention, though. Perhaps that's
something that should go in the annotations that metaprogrammers could
use... "If check fail in here, blame my macro code", versus "if checks
fail in here, trace it back to the user's annotations".

No, I don't see that one working. Oh well.

>  In any case, I have another approach for a type system I want to try
>  that more heavily depends on the algebraic properties of the language.
>  Initial results are encouraging, but I'll need another couple of
>  months to implement it as it's rather complex and I'm guessing my way
>  through it.

Whew! Good luck.

>  - John

-Wm
LightInTheBox - Buy quality products at wholesale price