MLton import headers

View: New views
11 Messages — Rating Filter:   Alert me  

MLton import headers

by Wesley W. Terpstra :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Currently MLton outputs C import headers (ironically with
-export-header). I think it might also be useful to output an ML
import file.

One of the problems with using a library is that you need to know how
you are linking against it. Often programmers can get away without
knowing, but this breaks down on some combinations of operating
system, architecture, and definitions. One of the tasks the C import
header is to make the details transparent to the user.

When using a MLton generated library from another MLton program, there
is the same problem. _import "foo" public: ...; is appropriate for a
static link, while _import "foo" external: ...; is appropriate for a
dynamic link. I was thinking of an output file something like:

signature M1 =
  sig
    val m1_open : int * string vector -> unit;
    val m1_close : unit -> unit;
  end

structure STATIC_LINK_M1 :> M1 =
  struct
    val m1_open = _import "m1_open" public;
    val m1_close = _import "m1_close" public;
  end

structure DYNAMIC_LINK_M1 :> M1 =
  struct
    val m1_open = _import "m1_open" external;
    val m1_close = _import "m1_close" external;
  end

I have intentionally left out PART_OF_M1 because you're better off not
using the FFI in this case.

Thoughts?

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Vesa Karvonen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 2, 2008 at 1:20 PM, Wesley W. Terpstra <wesley@...> wrote:
[...]
> One of the problems with using a library is that you need to know how
> you are linking against it. Often programmers can get away without
> knowing, but this breaks down on some combinations of operating
> system, architecture, and definitions.

In C and C++, and particularly on Windows, one often uses macros in
library headers and source files to insert appropriate compiler
specific declarations to mark symbols that are either exported from
the currently compiled DLL or imported from an external DLL. These
macros are often defined in a configuration header of some sort and
one can then easily switch between various linkage options
(static/dynamic library, import/export symbols).

> When using a MLton generated library from another MLton program, there
> is the same problem. _import "foo" public: ...; is appropriate for a
> static link, while _import "foo" external: ...; is appropriate for a
> dynamic link. I was thinking of an output file something like:
>
> signature M1 =
>  sig
>    val m1_open : int * string vector -> unit;
>    val m1_close : unit -> unit;
>  end
>
> structure STATIC_LINK_M1 :> M1 =
>  struct
>    val m1_open = _import "m1_open" public;
>    val m1_close = _import "m1_close" public;
>  end
>
> structure DYNAMIC_LINK_M1 :> M1 =
>  struct
>    val m1_open = _import "m1_open" external;
>    val m1_close = _import "m1_close" external;
>  end
>
> I have intentionally left out PART_OF_M1 because you're better off not
> using the FFI in this case.
>
> Thoughts?

One thing that comes to mind from this is that it would be nice to be
able to switch between static and dynamic linking easily without using
a complicated mechanism at the source level. In fact, I think that
ideally code that uses a library should really be exactly the same
regardless of whether the library is linked statically or
(unoptionally) dynamically. I'm not saying that it isn't possible with
the above format (you need to bind one of the modules, STATIC_LINK_M1
or DYNAMIC_LINK_M1, to a  module that you use in the client code), but
rather that you might want to consider this from that perspective.

One alternative that comes to mind would be to specify as a
command-line option whether one wishes to link statically or
dynamically and MLton would then generate only one set of imports and
the module name would be the same (e.g. just M1) in either case. This
import file generation step could be called as a part of the build
script (for the library) and the resulting module would then be used
to call the library.  Alternatively, the module could be selected in a
MLB file based on a path variable and the library would have
pregenerated import files for both cases (static/dynamic).  In either
case, the SML client code would be exactly the same regardless of
whether the library is linked statically or dynamically.

-Vesa Karvonen

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Wesley W. Terpstra :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 3, 2008 at 10:01 AM, Vesa Karvonen > In C and C++, and
particularly on Windows, one often uses macros in
> library headers and source files to insert appropriate compiler
> specific declarations to mark symbols ...

Yes, this is what MLton/svn does in the generated C import header.
However, in some cases a programmer still needs to specify how he
links a library. The MLton import headers do this with a macro:

#define STATIC_LINK_FOOBAR
#include <foobar.h>

> One thing that comes to mind from this is that it would be nice to be
> able to switch between static and dynamic linking easily without using
> a complicated mechanism at the source level. In fact, I think that
> ideally code that uses a library should really be exactly the same
> regardless of whether the library is linked statically or
> (unoptionally) dynamically. I'm not saying that it isn't possible with
> the above format (you need to bind one of the modules, STATIC_LINK_M1
> or DYNAMIC_LINK_M1, to a  module that you use in the client code), but
> rather that you might want to consider this from that perspective.

I agree this would be nice, but unfortunately it isn't. The assembly/C
generated by MLton to use a library must know whether those symbols
are imported from a dynamically or statically linked library. MLton
does not know this. When we pass '-lfoo' to the linker, it picks some
library out of paths we know nothing about.

> One alternative that comes to mind would be to specify as a
> command-line option whether one wishes to link statically or
> dynamically and MLton would then generate only one set of imports and
> the module name would be the same (e.g. just M1) in either case.

I am not sure where these options are to be passed, but it sounds like
you mean at library creation. What I think you've overlooked is that
the same library can be linked three different ways.

Consider a PIC archive library foo.

During compilation of C that will be included in the final PIC library
you have access to private symbols and shouldn't import symbols that
will be in the final foo product.

A shared library built against foo will statically link foo in. This
means it no longer has access to private symbols, but shouldn't be
importing the symbols either as they will be in the same final DSO.
MLton import headers call this PART_OF_LIBNAME.

A user of the resulting shared library wants to call functions that
were in foo. He has no access to private symbols and needs to import
the public ones. MLton import headers call this STATIC_LINK_LIBNAME

Each of these three uses are distinct and if you try to coflate any of
them I can present you with a platform where your solution breaks.
MLton import headers call this DYNAMIC_LINK_LIBNAME.

While it's certainly true that for many libraries there is a sane
default, sometimes there isn't. In MLton/svn terminology, we have four
output formats. The generated header access modes are described below:

executable -- only ever makes sense to use PART_OF_LIBNAME linkage
archive -- currently defaults to STATIC_LINK_LIBNAME, but if C is
included into the resulting library, PART_OF_LIBNAME should be defined
to override this default.
library -- currently defaults to DYNAMIC_LINK_LIBNAME, but override to
PART_OF_LIBNAME makes sense.
libarchive -- the case I detailed above, the import header has no
default and you have to specify the linkage.

I imagine we would output ML import files in the same way, eg: default
M1 = STATIC_LINK_M1 for archive, M1 = DYNAMIC_LINK_M1 for library. No
default for libarchive. Happily, there is no PART_OF_M1 case. At any
rate, this is what I am proposing: an automatic M1 that "does the
right thing" when it can, but includes both options in case you need
them.

Another problem on my mind, is that it's fairly common (on linux) for
libraries to be shipped as both dynamic (PIC .so) and static (non-PIC
archive), with only one header file. Sometimes important libraries
also include a PIC archive. This works because happily on ELF systems
public/external end up being the same. I'm not sure how to support
this model and at the same time also support platforms like windows
where public/external are critically different.

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Matthew Fluet-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 2 Oct 2008, Wesley W. Terpstra wrote:
> Thoughts?

The whole set of shared-library/visibility issues seems fairly opaque (a
reflection on the ABI itself, not its realization in MLton).  From some
of the scenarios that Wesley has described, it often seems to be the
case that one has to be very clear on how a program/library links to other
libraries, how a library is generated depending on its future use, etc.
Things seem sufficiently complex that it is unclear whether a complicated
set of implicit defaults is enough to shield a programmer completely (or
even mostly) from the details of which they need to be aware.  If that is
the case, then it is often simpler to allow/force someone to be explicit
up front, since that may be easier than trying to back out of the implicit
defaults when the need arises.

Also, picking up a theme similar to one that Vesa raised, is it so
difficult to set the visibility of an imported library at configure/build
time of a target.  For example, something like:

---foo_import.src---
structure Foo = struct
   val libopen = _import "foo_open" FOO_SCOPE: int * string vector -> unit;
   val libclose = _import "foo_close" FOO_SCOPE: unit -> unit;
end
------
---Makefile---
FOO_SCOPE=public

foo_import.sml: foo_import_src
  sed 's|FOO_SCOPE|$(FOO_SCOPE)|' < foo_import.scc > foo_import.sml
------

where the 'FOO_SCOPE=public' in the Makefile could either have been
determined at configure time (depending on the availability of libfoo.a or
libfoo.so) or left blank to be set at the 'make' invocation.

Finally, importing a MLton library (either static or dynamic) into another
MLton library or executable seems to be a fairly obscure usage.
Independent of the fact that MLton is a whole-program compiler (which
benefits from exposing all of the SML code code the compiler at once), I
don't know of any high-level language implementation (e.g., OCaml, GHC)
that prefers to import language-native libraries as though they were
language-independent system libraries.  [One might argue that CLR/.NET is
an exception, but, really, the 'language-native library' in that instance
happens to be .NET assemblies.] It is almost certainly a win to have all
high-level language code sharing the same instance of the runtime/GC/etc.

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Vesa Karvonen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 3, 2008 at 11:21 PM, Matthew Fluet <fluet@...> wrote:
> Finally, importing a MLton library (either static or dynamic) into another
> MLton library or executable seems to be a fairly obscure usage. [...]

I could certainly imagine some more or less plausible practical
reasons to do so, such as cutting compilation time. Although, given
the limited set of types available for interfacing with such
libraries, many such uses could just as well be written by compiling
the MLton "library" as a regular program that is spawned by the host
program as a separate process and communicated with via some form of
IPC. BTW, according to Armstrong's book this is a preferred way of
using foreign language libraries in Erlang.

> It is almost certainly a win to have all high-level language code sharing
> the same instance of the runtime/GC/etc.

Yes, if you are directly linking to the code, then I would tend to
agree (modulo exceptional practical concerns).  Isolating parts of a
program to different processes, OTOH, may have practical advantages
such as fault isolation (bug in a subsystem cannot crash the entire
program) that can be more valuable than efficiency.

-Vesa Karvonen

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Wesley W. Terpstra :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 3, 2008 at 10:21 PM, Matthew Fluet <fluet@...> wrote:
> Things seem
> sufficiently complex that it is unclear whether a complicated set of
> implicit defaults is enough to shield a programmer completely (or even
> mostly) from the details of which they need to be aware.  If that is the
> case, then it is often simpler to allow/force someone to be explicit up
> front, since that may be easier than trying to back out of the implicit
> defaults when the need arises.

So you would propose that I always require the user to
#define PART_OF_X / STATIC_LINK_X / DYNAMIC_LINK_X before #include'ing
the header?

> ---foo_import.src---
> structure Foo = struct
>  val libopen = _import "foo_open" FOO_SCOPE: int * string vector -> unit;
>  val libclose = _import "foo_close" FOO_SCOPE: unit -> unit;
> end
> ------
> ---Makefile---
> FOO_SCOPE=public
>
> foo_import.sml: foo_import_src
>        sed 's|FOO_SCOPE|$(FOO_SCOPE)|' < foo_import.scc > foo_import.sml

Well, I thought it would be nicer to have
structure FooStatic = struct
 val libopen = _import "foo_open" public: int * string vector -> unit;
 val libclose = _import "foo_close" public: unit -> unit;
end
structure FooDynamic = struct
 val libopen = _import "foo_open" public: int * string vector -> unit;
 val libclose = _import "foo_close" public: unit -> unit;
end
structure Foo = FooStatic (* or whatever a sane default is *)

... my way you can use functors and/or the basis system to do what you
want without having to drop out to make.

> Finally, importing a MLton library (either static or dynamic) into another
> MLton library or executable seems to be a fairly obscure usage.

I was aiming for completeness. Perhaps it's not necessary then to
output an ML import header; if you really need one, you can write it
yourself.

While I agree it's pretty bizarre to have two MLton libraries at once,
I think this could happen if you have pure C programs/libraries that
in turn use ML libraries.

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Matthew Fluet-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 4 Oct 2008, Wesley W. Terpstra wrote:

> On Fri, Oct 3, 2008 at 10:21 PM, Matthew Fluet <fluet@...> wrote:
>> Things seem
>> sufficiently complex that it is unclear whether a complicated set of
>> implicit defaults is enough to shield a programmer completely (or even
>> mostly) from the details of which they need to be aware.  If that is the
>> case, then it is often simpler to allow/force someone to be explicit up
>> front, since that may be easier than trying to back out of the implicit
>> defaults when the need arises.
>
> So you would propose that I always require the user to
> #define PART_OF_X / STATIC_LINK_X / DYNAMIC_LINK_X before #include'ing
> the header?

Not necessarily.  What is 'best/common' practice?  Doing a
'grep -r "(visibility(" /usr/include/*' (on an x86-linux) turns up some
headers that use it, but none appear to demand a
#define PART_OF/STATIC_LINK/DYNAMIC_LINK in order to control the
visibility.

>> ---foo_import.src---
>> structure Foo = struct
>>  val libopen = _import "foo_open" FOO_SCOPE: int * string vector -> unit;
>>  val libclose = _import "foo_close" FOO_SCOPE: unit -> unit;
>> end
>> ------
>> ---Makefile---
>> FOO_SCOPE=public
>>
>> foo_import.sml: foo_import_src
>>        sed 's|FOO_SCOPE|$(FOO_SCOPE)|' < foo_import.scc > foo_import.sml
>
> Well, I thought it would be nicer to have
> structure FooStatic = struct
> val libopen = _import "foo_open" public: int * string vector -> unit;
> val libclose = _import "foo_close" public: unit -> unit;
> end
> structure FooDynamic = struct
> val libopen = _import "foo_open" public: int * string vector -> unit;
> val libclose = _import "foo_close" public: unit -> unit;
> end
> structure Foo = FooStatic (* or whatever a sane default is *)
>
> ... my way you can use functors and/or the basis system to do what you
> want without having to drop out to make.

I don't see how functors and/or the ml basis sytem help here.  Neither
allow for control-flow or conditional compilation.  In your example above,
there is no way to change Foo from FooStatic to FooDynamic without
changing the source file.  As Vesa noted, you can use MLB variables to get
a poor-man's form of conditional compilation, but then you have to (in the
Makefile) load an appropriate mlb-path-map.


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Wesley W. Terpstra :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Oct 4, 2008 at 4:50 PM, Matthew Fluet <fluet@...> wrote:
>> So you would propose that I always require the user to
>> #define PART_OF_X / STATIC_LINK_X / DYNAMIC_LINK_X before #include'ing
>> the header?
>
> Not necessarily.  What is 'best/common' practice?  Doing a
> 'grep -r "(visibility(" /usr/include/*' (on an x86-linux) turns up some
> headers that use it, but none appear to demand a
> #define PART_OF/STATIC_LINK/DYNAMIC_LINK in order to control the visibility.

On linux you can ignore the difference of STATIC_LINK and
DYNAMIC_LINK. Thus most projects do. Libraries that must also work as
a dll on windows are the only ones which have to care.

All library projects provide some form of 'PART_OF', but it differs on
a library-to-library basis. The most common approach is to provide a
public header and one or more private headers. Examples of this are
gdtoa, gmp, and sqlite3. Some libraries also use a macro. For example,
gmp uses __GMP_WITHIN_GMP.

gmp also distinguishes static/dynamic linkage with __GMP_LIBGMP_DLL.
My approach follows gmp very closely. gmp sets GMP_LIBGMP_DLL at
library creation time depending on if it is a static or dynamic
library. It also has this to say:
      AC_MSG_ERROR([cannot build both static and DLL, since gmp.h is
different for each. Use "--disable-static --enable-shared" to build
just a DLL.])

I also set STATIC/DYNAMIC_LINK to some default in our export header
based on static/dynamic output. I've just allowed the user to override
it.

Very few libraries can compile to a PIC archive. glibc is one example,
but it clearly has no need to be portable to a non-ELF platform. This
is the only case where I currently leave no default. There is
definitely no 'best practice' for this type of library.

The current MLton approach is: the export header is exactly the same
for all output formats, except that the default
PART_OF/STATIC_LINK/DYNAMIC_LINK differs. The user can be explicit and
override this default.

On an ELF system you can use the export headers for both
static/dynamic, because EXERNAL/PUBLIC are identical on that target.
Thus you can do like most linux programs and supply one header that
works for both static and dynamic libraries.

My personal opinion is that removing the default and requiring the
user to always specify the linkage would be surprising when compared
to how one typically uses C libraries. There are only two cases where
the default is 'wrong'. 1) PIC archives, which are a corner case very
few projects need. In this case the header has no default and forces
you to pay attention. 2) You are compiling a library including ML code
and C code. In this case we require:
  #define PART_OF_XYZ
  #include "xyz.h"
inside your C files instead of:
  #include "xyz.h"
  #include "xyz-private.h"
This doesn't seem particularly onerous or strange to me. You're
cooperating with MLton to build a library in this case, so it's fair
for us to have a (not-so-uncommon) convention you need to be aware of.

> I don't see how functors and/or the ml basis sytem help here.  Neither allow
> for control-flow or conditional compilation.

I meant to use the mlb-path-map approach. You can write your
library-dependent code as a functor and then bind it to one of the two
structures in a file chosen depending on a path variable.

At any rate, I no longer think the ML export header is necessary. The
only situtation it seems reasonable for an ML program to use an ML
library via FFI is if there was a pure C library between them, in
which case you probably shouldn't be using the internal ML library
anyway, but rather the C library's wrappers.

What I do think we need is a new annotation, 'defaultImport
public/external'. This way your 'prim.sml' that does all the
_import/_address/_symbol'ing can be easily switched between
static/dynamic import.

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Matthew Fluet-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 4 Oct 2008, Wesley W. Terpstra wrote:

> My personal opinion is that removing the default and requiring the
> user to always specify the linkage would be surprising when compared
> to how one typically uses C libraries. There are only two cases where
> the default is 'wrong'. 1) PIC archives, which are a corner case very
> few projects need. In this case the header has no default and forces
> you to pay attention. 2) You are compiling a library including ML code
> and C code. In this case we require:
>  #define PART_OF_XYZ
>  #include "xyz.h"
> inside your C files instead of:
>  #include "xyz.h"
>  #include "xyz-private.h"
> This doesn't seem particularly onerous or strange to me. You're
> cooperating with MLton to build a library in this case, so it's fair
> for us to have a (not-so-uncommon) convention you need to be aware of.

That seems like a reasonable rationale.  If I understand things correctly,
though, one advantage of the
   #include "xyz.h"
   #include "xyz-private.h"
approach is that is textually separates the public (in the sense of the
documented portions of the library API available for use in other
projects) functions from the private (in the sense of the undocumented
portions of the library API) functions.  That is, clients of "xyz.h" can't
even see the prototypes of the private functions.  On the other hand, if
the platform supports the right visibility attributes, then although a
client might be able to see the prototype of a private function in a
(combined) "xyz.h" header, they would fail at link time, because the
function wouldn't be visible in the binary library.  I guess you
acheive this effect in the (combined) "xyz.h" by using
   #define MLLIB_PRIVATE(x)
when not PART_OF_XYZ to hide the private functions.

> What I do think we need is a new annotation, 'defaultImport
> public/external'. This way your 'prim.sml' that does all the
> _import/_address/_symbol'ing can be easily switched between
> static/dynamic import.

This shifts the point of change from the 'prim.sml' file to the 'xyz.mlb'
file.  While one could use "-default-ann 'defaultImport public'", this
could have undesirable effects if you import from two different libraries
that demand different linking, since one global default doesn't apply.

Anyways, it doesn't appear that there are clear-cut solutions.  But, I
tend to agree that an auto-generated SML import header is a sufficiently
rare corner case that it doesn't really demand extra compiler support.

-Matthew

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Wesley W. Terpstra :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, Oct 5, 2008 at 12:02 AM, Matthew Fluet <fluet@...> wrote:

> That seems like a reasonable rationale.  If I understand things correctly,
> though, one advantage of the
>  #include "xyz.h"
>  #include "xyz-private.h"
> approach is that is textually separates the public (in the sense of the
> documented portions of the library API available for use in other projects)
> functions from the private (in the sense of the undocumented portions of the
> library API) functions.  That is, clients of "xyz.h" can't even see the
> prototypes of the private functions.

Correct. On the flip-side, having only one file is convenient.

While I agree that header files can serve a documentation purpose, I
don't think that's true for us. An automatically generated header is
full of extra-trash and lacks comments. The header doesn't declare the
symbols to C and output library doesn't expose them to the linker. If
you want to document your API, do a better job than an automatic
header file.

>> What I do think we need is a new annotation, 'defaultImport
>> public/external'. This way your 'prim.sml' that does all the
>> _import/_address/_symbol'ing can be easily switched between
>> static/dynamic import.
>
> This shifts the point of change from the 'prim.sml' file to the 'xyz.mlb'
> file.

That's exactly what I wanted: one place to change. Some of my projects
have prim.sml files that run over 1000 lines. These aren't
auto-generated, so there's no possiible autogenerated ML import header
to have in this case. An annotation could solve this neatly, without
trying to 'sed' the file, which I think is a pretty crude and fragile
approach.

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: MLton import headers

by Vesa Karvonen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Oct 4, 2008 at 12:28 AM, Vesa Karvonen <vesa.a.j.k@...> wrote:

> On Fri, Oct 3, 2008 at 11:21 PM, Matthew Fluet <fluet@...> wrote:
>> Finally, importing a MLton library (either static or dynamic) into another
>> MLton library or executable seems to be a fairly obscure usage. [...]
>
> I could certainly imagine some more or less plausible practical
> reasons to do so, such as cutting compilation time. Although, given
> the limited set of types available for interfacing with such
> libraries, many such uses could just as well be written by compiling
> the MLton "library" as a regular program that is spawned by the host
> program as a separate process and communicated with via some form of
> IPC. BTW, according to Armstrong's book this is a preferred way of
> using foreign language libraries in Erlang.

Continuing on that thought, I just committed an experimental (and
somewhat incomplete with respect to what I want it to do) RPC (Remove
Procedure Call) library to mltonlib
(http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/org/mlton/vesak/rpc-lib/unstable/)
that allows one to do that fairly easily.  There is also simple
example of using the library.  To try the example, first build it then
start the server and finally run the client.

-Vesa Karvonen

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton
LightInTheBox - Buy quality products at wholesale price!