|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
The bool array/vector performance bugThe bool array/vector performance bug (bool array/vector uses 4-bytes per
element) has been discussed on the list a number of times (you can find them by googling for "bool representation", for example). Skimming through at a couple of discussions, I was most convinced by the reasoning in the following posts by Wesley and Stephen: http://mlton.org/pipermail/mlton/2006-June/028936.html http://mlton.org/pipermail/mlton/2006-June/028940.html In summary, - don't allow ML bool in the FFI, but do - allow C Bool ("C_Bool.t") in the FFI. This means that one needs to explicitly convert between ML and C bools. I think that this is the pretty much the only sane alternative, because the resulting code will be portable (the size of C bool is platform dependent) and it allows maximal flexibility on the ML side. The current approach, using 4-byte bools, is just plain wrong, IMO, because a C bool is not specified to be 4-bytes. Using 4-bytes per bool on the ML side is also highly inefficient and (which doesn't make it wrong but) makes MLton look bad on some toy benchmarks (http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsieve&lang=all). So, what is the status of the bool FFI thingy? And what places in the compiler/runtime/basis lib need to be changed to: - eliminate ML bool from the FFI, - add C_Bool to FFI (this might already be available?), and - change the representation of ML bool to use 8-bits (or 1-bit) (in arrays and vectors at least)? Is that all? I would guess that changing the representation is about as difficult as knowing all the places that need to be changed. If more programming is needed, I'd be happy to help to fix this. -Vesa Karvonen _______________________________________________ MLton mailing list MLton@... http://mlton.org/mailman/listinfo/mlton |
|
|
Re: The bool array/vector performance bugOn Sat, 31 May 2008, Vesa Karvonen wrote:
> The bool array/vector performance bug (bool array/vector uses 4-bytes per > element) has been discussed on the list a number of times (you can find > them by googling for "bool representation", for example). It is not clear to me that using 4-bytes per bool (rather than 1-bit or 1-byte) is best characterized as a "performance bug". I suspect that on most 32-bit systems, accessing a 4-byte element is more efficient than accessing a 1-byte element. On the other hand, it is certainly the case that more 1-byte elements will fit in a cache line. > - don't allow ML bool in the FFI, but do > - allow C Bool ("C_Bool.t") in the FFI. The C_Bool structure is in the Basis Library implementation, but not currently exported by c-types.mlb. The C_Bool structure is equal to (the application of the WordToBool functor to) a WordN structure, where N corresponds to the target platform's size of the C-type _Bool. > I think that this is the pretty much the only sane alternative, because > the resulting code will be portable (the size of C bool is platform > dependent) and it allows maximal flexibility on the ML side. Portability is a different concern than performance. > The current approach, using 4-byte bools, is just plain wrong, IMO, > because a C bool is not specified to be 4-bytes. Fair enough, but we don't currently specify that an ML bool is equivalent to a C _Bool. Rather, we specify that an ML bool is equivalent to a C int32_t (see http://mlton.org/ForeignFunctionInterfaceTypes). We don't currently specify that any ML type is equivalent to a C _Bool. We actually go through some hoops in the elaborator (<src>/mlton/elaborate/elaborate-core.fun) to compare values coming to/from C symbols against 0 when they are being used as ML bool. (This could be simplified by excluding ML bool from the FFI.) Though, this doesn't help when the bool is coming from an _import or _export and it doesn't help if the FFI is used with an indirect type. That is, all bets are off if you have: z.c: void f(int32_t *a, int32_t i) { a[i] = 7; } z.sml: val f = _import "f": bool array -> unit; val a = Array.tabulate (10, fn () => true); val () = ... f 3 ... Array.sub (a, 3) ... > Using 4-bytes per bool on the ML side is also highly inefficient and > (which doesn't make it wrong but) makes MLton look bad on some toy > benchmarks > (http://shootout.alioth.debian.org/gp4/benchmark.php?test=nsieve&lang=all). Changing the benchmark to use Word8.word rather than bool did show a speedup (which I attribute to packing more elements into a cache line): [fluet@wolfpup tmp]$ for ((i=0;i<5;i++)); do /usr/bin/time ./nsieve-word8 12 > /dev/null ; done 16.43user 0.22system 0:17.58elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+18172minor)pagefaults 0swaps 16.38user 0.24system 0:17.67elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+18172minor)pagefaults 0swaps 16.42user 0.24system 0:17.59elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+18172minor)pagefaults 0swaps 16.37user 0.23system 0:17.55elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+18170minor)pagefaults 0swaps 16.44user 0.21system 0:17.68elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+18172minor)pagefaults 0swaps [fluet@wolfpup tmp]$ for ((i=0;i<5;i++)); do /usr/bin/time ./nsieve-bool 12 > /dev/null ; done 22.88user 0.61system 0:25.62elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+60672minor)pagefaults 0swaps 22.92user 0.51system 0:25.01elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k 24inputs+0outputs (0major+60672minor)pagefaults 0swaps 22.90user 0.53system 0:24.76elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+60671minor)pagefaults 0swaps 22.89user 0.54system 0:24.75elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+60671minor)pagefaults 0swaps 22.88user 0.53system 0:24.80elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+60672minor)pagefaults 0swaps > So, what is the status of the bool FFI thingy? And what places in the > compiler/runtime/basis lib need to be changed to: > - eliminate ML bool from the FFI, Dropping '("Bool", CType.bool, Tycon.bool)' from the list of nullary FFI types at <src>/mlton/elaborate/elaborate-core.fun:755 will effectively remove ML bool from the FFI. As noted above, you could also clean up the code paths in mkFetch, mkStore, and mkSymbol that arise from the isBool tests. There are no essential uses of ML bool as an FFI type in the Basis Library implementation, but there are some uses in <src>/runtime/gen/basis-ffi.def where we declare C functions that are used as the implementation of comparison primitives that return bool. > - add C_Bool to FFI (this might already be available?), and It is implicitly available, as we have: structure C_Bool = WordToBool (type t = Word8.word val zero: t = 0wx0 val one: t = 0wx1) in <src>/basis-library/config/c/$(TARGET_ARCH)-$(TARGET_OS)/c-types.sml (though Word8 might be Word32 on some platforms), and Word8.word is an FFI type. > - change the representation of ML bool to use 8-bits (or 1-bit) (in arrays > and vectors at least)? The representation is established in <src>/mlton/backend/packed-representation.fun; see lines 1407-1417. The underlying word size is set by WordSize.bool (<src>/mlton/ast/word-size.fun). I would look carefully at all the uses of WordSize.bool in the compiler, and also all the uses of CType.bool. > Is that all? I would guess that changing the representation is about as > difficult as knowing all the places that need to be changed. If more > programming is needed, I'd be happy to help to fix this. One other item is the handling of primitives that return boolean values. Some of these primitives are not implemented by the codegens (e.g., the x86 codegen doesn't implement Word64_{equal,lt}), and so are turned into C functions (in <src>/mlton/codegen/ssa-to-rssa.fun). (This is why they are declared in <src>/runtime/gen/basis-ffi.def.) If the C functions do not return values in the same representation as that of an ML bool, then you will need to insert some coercion, either a word-size coercion or an explicit test against 0. Contrary to some of the comments I made in the threads cited above, getting hold of platform-dependent types in the compiler proper has proven expedient. So, if it were necessary to have the size/representation of C _Bool known by the compiler, then it should be possible to do so; see the implementation of Control.Target.Size.cint and CType.cint and <src>/runtime/gen/gen-sizes.c. Thus, another alternative is to simply establish that ML bool is represented as C _Bool, by making WordSize.bool pull from Control.Target.Size.cbool. Nonetheless, I think that divorcing ML bool from C _Bool is the right decision. _______________________________________________ MLton mailing list MLton@... http://mlton.org/mailman/listinfo/mlton |
| Free Forum Powered by Nabble | Forum Help |