Embedded MLton

View: New views
5 Messages — Rating Filter:   Alert me  

Embedded MLton

by Wesley W. Terpstra-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I was curious to see how feasible it is to run MLton on a very small embedded device. My target system was a 200MHz router with 16MB of RAM and 4MB of flash running mipsel linux 2.4 with uclibc and busybox.

The executive summary: it works quite well.

After some minor patches (attached), all the regressions passed with the following exceptions:
real and real-int failed to compile because they expected {cos,rint,tan,...}f which the device does not have in libc (it only has the double versions).
fixed-integer and word-all spat warnings on compile...
  /tmp/ccJlIVsa.s:66173: Warning: Macro instruction expanded into multiple instructions
... but worked
finalize.6 and a few others failed to complete until I used @MLton fixed-heap.
Notably, both time profiling and world save/load worked without problem.

I have not included a patch I had made to platform/getText.c changing _start to _DYNAMIC. The particular mipsel toolchain for this device failed to link against _start, but another toolchain I have for a (bigger) mipsel device succeeded. I don't know how portable _DYNAMIC is.

What options can I use to convince MLton to output small executables? Ideally I'd like to get hello-world to below 100k. With static linking of all but libc/m it is currently 260k.

I already tried removing all unused symbols from libgmp.a and libmlton.a. The benefit was not spectacular, only 40k. libgmp.a already includes each function in a separate file so static linking drops all the unneeded methods. Similarly, most of the basis wrapper functions are dropped.

[mips.patch]

Index: runtime/platform/linux.c
===================================================================
--- runtime/platform/linux.c (revision 6672)
+++ runtime/platform/linux.c (working copy)
@@ -10,6 +10,9 @@
 #include "nonwin.c"
 #include "sysconf.c"
 #include "use-mmap.c"
+#ifdef __UCLIBC__
+#include "feround.c"
+#endif
 
 #ifndef EIP
 #define EIP     14
@@ -19,7 +22,6 @@
  *  alpha: ucp->m_context.sc_pc
  *  arm: ucp->m_context.ctx.arm_pc
  *  ia64: ucp->m_context.sc_ip & ~0x3UL
- *  mips: ucp->m_context.sc_pc
  *  s390: ucp->m_context.sregs->regs.psw.addr
  */
 static void catcher (__attribute__ ((unused)) int sig,
@@ -45,6 +47,13 @@
 #else
         GC_handleSigProf ((code_pointer) scp->si_regs.pc);
 #endif
+#elif (defined (__mips__))
+        ucontext_t* ucp = (ucontext_t*)mystery;
+#ifdef __UCLIBC__
+        GC_handleSigProf ((code_pointer) ucp->uc_mcontext.gpregs[CTX_EPC]);
+#else
+        GC_handleSigProf ((code_pointer) ucp->uc_mcontext.pc);
+#endif
 #elif (defined (__i386__))
         ucontext_t* ucp = (ucontext_t*)mystery;
         GC_handleSigProf ((code_pointer) ucp->uc_mcontext.gregs[EIP]);
Index: runtime/platform/feround.c
===================================================================
--- runtime/platform/feround.c (revision 6672)
+++ runtime/platform/feround.c (working copy)
@@ -1,30 +1,26 @@
-#if (defined __i386__)
-
-/* Macros for accessing the hardware control word. */
+#ifndef _FPU_SETCW
+#if defined(__i386__)
+/* Macros for accessing the hardware control word on i386. */
 #define _FPU_GETCW(cw) __asm__ ("fnstcw %0" : "=m" (*&cw))
 #define _FPU_SETCW(cw) __asm__ ("fldcw %0" : : "m" (*&cw))
+#else
+#error Cannot acccess FPU control word
+#endif
+#endif
 
-#define ROUNDING_CONTROL_MASK 0x0C00
-#define ROUNDING_CONTROL_SHIFT 10
+#define FE_MASK (FE_DOWNWARD|FE_TONEAREST|FE_TOWARDZERO|FE_UPWARD)
 
 int fegetround () {
-        unsigned short controlWord;
-
-        _FPU_GETCW (controlWord);
-        return (controlWord & ROUNDING_CONTROL_MASK) >> ROUNDING_CONTROL_SHIFT;
+        fpu_control_t controlWord;
+        _FPU_GETCW(controlWord);
+        return controlWord & FE_MASK;
 }
 
-void fesetround (int mode) {
-        unsigned short controlWord;
+int fesetround (int mode) {
+        fpu_control_t controlWord;
 
         _FPU_GETCW (controlWord);
-        controlWord &= ~ROUNDING_CONTROL_MASK;
-        controlWord |= mode << ROUNDING_CONTROL_SHIFT;
+        controlWord = (controlWord & ~FE_MASK) | mode;
         _FPU_SETCW (controlWord);
+        return 0;
 }
-
-#else
-
-#error fe{get,set}round not implemented
-
-#endif
Index: runtime/platform/linux.h
===================================================================
--- runtime/platform/linux.h (revision 6672)
+++ runtime/platform/linux.h (working copy)
@@ -1,4 +1,9 @@
+#ifdef __UCLIBC__
+#include "feround.h"
+#else
 #include <fenv.h>
+#endif
+
 #include <inttypes.h>
 #include <stdint.h>
 
Index: runtime/platform/feround.h
===================================================================
--- runtime/platform/feround.h (revision 6672)
+++ runtime/platform/feround.h (working copy)
@@ -1,2 +1,9 @@
+#include <fpu_control.h>
+
+#define FE_DOWNWARD     _FPU_RC_DOWN
+#define FE_TONEAREST    _FPU_RC_NEAREST
+#define FE_TOWARDZERO   _FPU_RC_ZERO
+#define FE_UPWARD       _FPU_RC_UP
+
 int fegetround (void);
-void fesetround (int mode);
+int fesetround (int mode);
Index: runtime/basis/Word/Word.c
===================================================================
--- runtime/basis/Word/Word.c (revision 6672)
+++ runtime/basis/Word/Word.c (working copy)
@@ -24,7 +24,7 @@
  * implements / and %.
  */
 
-#if ! (defined (__amd64__) || defined (__hppa__) || defined (__i386__) || defined(__ia64__)|| defined (__ppc__) || defined (__powerpc__) || defined (__sparc__))
+#if ! (defined (__amd64__) || defined (__hppa__) || defined (__i386__) || defined(__ia64__) || defined(__mips__) || defined (__ppc__) || defined (__powerpc__) || defined (__sparc__))
 #error check that C {/,%} correctly implement {quot,rem} from the basis library
 #endif
 
Index: bin/regression
===================================================================
--- bin/regression (revision 6672)
+++ bin/regression (working copy)
@@ -74,8 +74,8 @@
 if $cross; then
  flags[${#flags[@]}]="-target"
  flags[${#flags[@]}]="$crossTarget"
- flags[${#flags[@]}]="-stop"
- flags[${#flags[@]}]="g"
+# flags[${#flags[@]}]="-stop"
+# flags[${#flags[@]}]="g"
 fi
 cont='callcc.sml callcc2.sml callcc3.sml once.sml'
 flatArray='finalize.sml flat-array.sml flat-array.2.sml'


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Embedded MLton

by Wesley W. Terpstra-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Urp. The last part of that patch (the bit affecting bin/regression) shouldn't be there.


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Embedded MLton

by Adam Goode-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Wesley W. Terpstra wrote:

> What options can I use to convince MLton to output small executables?
> Ideally I'd like to get hello-world to below 100k. With static linking
> of all but libc/m it is currently 260k.
>
> I already tried removing all unused symbols from libgmp.a and
> libmlton.a. The benefit was not spectacular, only 40k. libgmp.a already
> includes each function in a separate file so static linking drops all
> the unneeded methods. Similarly, most of the basis wrapper functions are
> dropped.
>
>
Hi,

This seems quite interesting.

There are two things I could think of worth trying:

1. Compile mlton with -ffunction-sections and -fdata-sections, then link
with --gc-sections (gcc option is -Wl,--gc-sections). This may save
space or may even bloat space, but it is worth investigating. Also, try
compiling everything with -Os. It is also worth seeing if mlton's
invocation of gcc can be safely augmented with these options. Finally,
look into the -combine and -fwhole-program gcc options.

2. Make libmlton a dynamic library. You'll get no benefit if you are
installing only 1 program, but it will start to pay off after installing
a few. You can try this without too much hassle for testing, and if it
becomes useful, only later look into the ABI issues of having a
persistent libmlton installed on a system. (For your embedded device,
this probably won't matter.) I'd be interested in having a shared
libmlton on Fedora, since it makes executables smaller and makes various
gdb and debugging things easier (separate debuginfo packages don't work
with static libraries).

Both of these techniques can work with each other, and may both give
benefits. One more thing, if you try the dynamic library route, making a
proper set of function exports and using hidden visibility will probably
lead to even more benefit. http://gcc.gnu.org/wiki/Visibility


Looking forward to your results!


Thanks,

Adam




_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

signature.asc (267 bytes) Download Attachment

Parent Message unknown Embedded MLton

by Wesley W. Terpstra-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jul 25, 2008 at 7:54 PM, Adam Goode <adam@...> wrote:
Wesley W. Terpstra wrote:
What options can I use to convince MLton to output small executables? Ideally I'd like to get hello-world to below 100k. With static linking of all but libc/m it is currently 260k.

I already tried removing all unused symbols from libgmp.a and libmlton.a. The benefit was not spectacular, only 40k. libgmp.a already includes each function in a separate file so static linking drops all the unneeded methods. Similarly, most of the basis wrapper functions are dropped.

1. Compile mlton with -ffunction-sections and -fdata-sections, then link with --gc-sections (gcc option is -Wl,--gc-sections). This may save space or may even bloat space, but it is worth investigating. Also, try compiling everything with -Os. It is also worth seeing if mlton's invocation of gcc can be safely augmented with these options. Finally, look into the -combine and -fwhole-program gcc options.

I already messed around with -fdata-sections and -ffunction-sections. However, separate C source files are already compiled into different text segments. When you link statically to an archive, an object file is only linked in if it satisfies an unresolved symbol. Thus, if each source file contains a separate function, the options you list do not improve the space use of the program. Most of the MLton runtime already has one function per file, with the notable exception of the garbage collector. However, the garbage collector has almost no dead-code anyway. Anyway, I did something very similar to what you propose (though a bit more aggressive) and only won 40k as I mentioned.

You can change -O1 to -Os by modifying the mlton execution shell script. At least for my hello-world test program, the executable had exactly the same size as with -O1.

I was hoping to learn what options I can give to the MLton compiler itself to have it generate more space-efficient executable code.

2. Make libmlton a dynamic library. You'll get no benefit if you are installing only 1 program, but it will start to pay off after installing a few. You can try this without too much hassle for testing, and if it becomes useful, only later look into the ABI issues of having a persistent libmlton installed on a system. (For your embedded device, this probably won't matter.) I'd be interested in having a shared libmlton on Fedora, since it makes executables smaller and makes various gdb and debugging things easier (separate debuginfo packages don't work with static libraries).

While I agree with you that this can be a good idea for an embedded device, there are a couple caveats. First, a dynamic library always includes all objects. Thus it will be larger than the piece linked into a static binary. However, with enough MLton programs you would still save. Doing this on Fedora though is, I think, a bad idea. There is a very tight coupling between the compiler and the runtime. Executables compiled with version A, but using runtime library version B may experience bad behaviour. Your Fedora box is big, just pay the cost. :-)

Both of these techniques can work with each other, and may both give benefits. One more thing, if you try the dynamic library route, making a proper set of function exports and using hidden visibility will probably lead to even more benefit. http://gcc.gnu.org/wiki/Visibility

I already implemented this in the patch to output C libraries from MLton.

However, that page makes the interesting claim that if a symbol has visibility hidden it doesn't need to accessed via PIC. If true and I tell gas that all the MLton_* methods are hidden, perhaps this would avoid needing the amd64 codegen to output PIC compatible code! (the runtime is compiled with -fPIC so the relocation to libgmp/c/m should work there)


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Embedded MLton

by Matthew Fluet-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, 20 Jul 2008, Wesley W. Terpstra wrote:
> What options can I use to convince MLton to output small executables?
> Ideally I'd like to get hello-world to below 100k. With static linking of
> all but libc/m it is currently 260k.

You can try adjusting the various inlining and code-duplication
threshholds.  But, that often as much hurts code size, because there is a
win in trimming unused code paths after inlining.

Also, in a hello-world program, the majority executable will come from the
libraries (libmlton and libgmp), not from the individual program.  For
example, on amd64-linux (with libgmp shared):

[fluet@shadow temp]$ cat z.sml
val () = print "Hello world!\n"
[fluet@shadow temp]$ mlton -keep o z.sml
[fluet@shadow temp]$ size z.0.o z.1.o z
    text    data     bss     dec     hex filename
    2465    6136       2    8603    219b z.0.o
   33411      19       0   33430    8296 z.1.o
  122157    7296   10208  139661   2218d z

So approx 66% comes from libmlton.a.  The garbage collector probably
contributes the most from libmlton.a to the executable.

> I already tried removing all unused symbols from libgmp.a and libmlton.a.
> The benefit was not spectacular, only 40k. libgmp.a already includes each
> function in a separate file so static linking drops all the unneeded
> methods. Similarly, most of the basis wrapper functions are dropped.

Right.  You are left with the garbage collector.  There are some features
that you might be willing to disable for an embedded system (for example,
the hash-consing collection is rarely used), but you would need to build a
customized libmlton.a for that.


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton