Constant Folding of FP Operations

View: New views
5 Messages — Rating Filter:   Alert me  

Constant Folding of FP Operations

by Vesa Karvonen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Attached is an experimental patch that improves MLton's constant folding
of floating point operations.  The problem with constant folding FP
operations in SML is that the FP ops are subject to rounding mode
settings:

  http://www.standardml.org/Basis/ieee-float.html#SIG:IEEE_REAL.setRoundingMode:VAL

The workaround used in the patch is to evaluate the operations in all
rounding modes (actually in only TO_NEGINF and TO_POSINF) and check that
the results agree.  This ensures that constant folding is correct in all
rounding modes.

Personally I'm inclined to think that the inclusion of get/setRoundingMode
in the Basis library is a design mistake, because it makes some FP
optimizations very difficult.  An alternative design would be to have
multiple modules, say Real[ToNearest], RealToNegInf, RealToPosInf, and
RealToZero, each corresponding to a specific rounding mode.  FP ops could
then be treated as functional and it would then likely become
straightforward to make the same kind of aggressive constant folding and
simplification of FP ops as is done for integer ops.  Of course, this
would introduce the need for the compiler to insert code to switch the
rounding modes when necessary, but this doesn't seem extremely difficult.

About the patch.  The compiler changes do not compile with SML/NJ, which
does not support 32-bit floating point (Real32), and it should be made so
that the FP simplifications aren't performed (and would be replaced with
dummy functions) when the compiler has been compiled with SML/NJ.  This is
the reason why I consider this an experimental patch and haven't committed
it.  Some of the floating point simplifications should probably also be
disabled when cross compiling.  These should not be very difficult to do,
but it seems that I'm unlikely get around to implementing them.

The "decon" datatype has some constructors that aren't currently used
(although they are produced by the decon function) in the patch.  I
planned those constructors for some optimizations (e.g. replace divide by
a power of 2 with a multiply) that aren't currently possible with MLton's
Prim.apply, because it does not allow the introduction of an operation
with constants in the result of Prim.apply.  This restriction of
Prim.apply could probably be lifted without huge problems.  The limitation
is probably due to the design of the SSA and SSA2 intermediate languages
of MLton that do not allow constants as operands.

The purpose of the changes to real.sml in the basis library implementation
is to expose opportunities for constant folding.  Calls to the "class"
function cannot be optimized away (by the current compiler).

-Vesa Karvonen

[real-cf.patch]

Index: mlton/atoms/real-x.sig
===================================================================
--- mlton/atoms/real-x.sig (revision 6625)
+++ mlton/atoms/real-x.sig (working copy)
@@ -10,6 +10,7 @@
 signature REAL_X_STRUCTS =
    sig
       structure RealSize: REAL_SIZE
+      structure WordX: WORD_X
    end
 
 signature REAL_X =
@@ -19,11 +20,46 @@
       (* reals of all RealSize.t sizes. *)
       type t
 
+      datatype decon =
+         NAN
+       | ZERO of {signBit: bool}
+       | ONE of {signBit: bool}
+       | POW2 of {signBit: bool, exp: int} (* man = 0.5 *)
+       | FIN of {signBit: bool, exp: int, man: t}
+       | INF of {signBit: bool}
+
+      val abs: t -> t
+      val acos: t -> t option
+      val add: t * t -> t option
+      val asin: t -> t option
+      val atan2: t * t -> t option
+      val atan: t -> t option
+      val castFromWord: WordX.t -> t
+      val castToWord: t -> WordX.t
+      val cos: t -> t option
+      val decon: t -> decon
+      val div: t * t -> t option
+      val equal: t * t -> bool
       val equals: t * t -> bool
+      val exp: t -> t option
+      val fromIntInf: IntInf.t * RealSize.t -> t option
       val hash: t -> word
       val layout: t -> Layout.t
+      val le: t * t -> bool
+      val ln: t -> t option
+      val log10: t -> t option
+      val lt: t * t -> bool
       val make: string * RealSize.t -> t option
+      val mul: t * t -> t option
+      val muladd: t * t * t -> t option
+      val mulsub: t * t * t -> t option
+      val neg: t -> t
+      val qequal: t * t -> bool
+      val sin: t -> t option
       val size: t -> RealSize.t
+      val sqrt: t -> t option
+      val sub: t * t -> t option
+      val tan: t -> t option
       val toString: t -> string
       val zero: RealSize.t -> t
    end
Index: mlton/atoms/atoms.fun
===================================================================
--- mlton/atoms/atoms.fun (revision 6625)
+++ mlton/atoms/atoms.fun (working copy)
@@ -24,8 +24,9 @@
       structure Con = Con ()
       structure CType = CType (structure RealSize = RealSize
                                structure WordSize = WordSize)
-      structure RealX = RealX (structure RealSize = RealSize)
       structure WordX = WordX (structure WordSize = WordSize)
+      structure RealX = RealX (structure RealSize = RealSize
+                               structure WordX = WordX)
       structure WordXVector = WordXVector (structure WordSize = WordSize
                                            structure WordX = WordX)
       structure Func =
Index: mlton/atoms/real-x.fun
===================================================================
--- mlton/atoms/real-x.fun (revision 6625)
+++ mlton/atoms/real-x.fun (working copy)
@@ -70,4 +70,188 @@
 
 val hash = String.hash o toString
 
+local
+   fun make (o32, o64) =
+    fn Real32 x => Real32 (o32 x)
+     | Real64 x => Real64 (o64 x)
+in
+   val neg = make (Real32.~, Real64.~)
+   val abs = make (Real32.abs, Real64.abs)
 end
+
+structure P = Pervasive
+structure PR32 = P.Real32
+structure PR64 = P.Real64
+structure PIR = P.IEEEReal
+
+datatype 'r r =
+   R of {zero: 'r, half: 'r, one: 'r, inf: 'r, abs: 'r -> 'r,
+         signBit: 'r -> bool, isNan: 'r -> bool,
+         toManExp: 'r -> {exp: int, man: 'r},
+         compareReal: 'r * 'r -> PIR.real_order,
+         bits: Bits.t,
+         subVec: P.Word8Vector.vector * int -> P.LargeWord.word,
+         update: P.Word8Array.array * int * P.LargeWord.word -> unit,
+         toBytes: 'r -> P.Word8Vector.vector,
+         subArr: P.Word8Array.array * int -> 'r,
+         tag: 'r -> t}
+
+val r32 =
+    R {zero = 0.0, half = 0.5, one = 1.0, inf = PR32.posInf,
+       abs = PR32.abs, signBit = PR32.signBit, isNan = PR32.isNan,
+       toManExp = PR32.toManExp, compareReal = PR32.compareReal,
+       bits = Bits.inWord32,
+       subVec = P.PackWord32Little.subVec,
+       update = P.PackWord32Little.update,
+       toBytes = P.PackReal32Little.toBytes,
+       subArr = P.PackReal32Little.subArr,
+       tag = Real32}
+val r64 =
+    R {zero = 0.0, half = 0.5, one = 1.0, inf = PR64.posInf,
+       abs = PR64.abs, signBit = PR64.signBit, isNan = PR64.isNan,
+       toManExp = PR64.toManExp, compareReal = PR64.compareReal,
+       bits = Bits.inWord64,
+       subVec = P.PackWord64Little.subVec,
+       update = P.PackWord64Little.update,
+       toBytes = P.PackReal64Little.toBytes,
+       subArr = P.PackReal64Little.subArr,
+       tag = Real64}
+
+local
+   fun doit (R {compareReal, signBit, isNan, tag, ...}) (f, arg) =
+       case !Control.target of
+          Control.Cross _ => NONE
+        | Control.Self => let
+             val old = PIR.getRoundingMode ()
+             val () = PIR.setRoundingMode PIR.TO_NEGINF
+             val min = f arg
+             val () = PIR.setRoundingMode PIR.TO_POSINF
+             val max = f arg
+             val () = PIR.setRoundingMode old
+          in
+             if PIR.EQUAL = compareReal (min, max)
+                andalso signBit min = signBit max
+                orelse isNan min andalso isNan max
+             then SOME (tag min)
+             else NONE
+          end
+
+   fun make1 (o32, o64) =
+    fn Real32 x => doit r32 (o32, x)
+     | Real64 x => doit r64 (o64, x)
+
+   fun make2 (o32, o64) =
+    fn (Real32 x, Real32 y) => doit r32 (o32, (x, y))
+     | (Real64 x, Real64 y) => doit r64 (o64, (x, y))
+     | _ => Error.bug "impossible"
+
+   fun make3 (o32, o64) =
+    fn (Real32 x, Real32 y, Real32 z) => doit r32 (o32, (x, y, z))
+     | (Real64 x, Real64 y, Real64 z) => doit r64 (o64, (x, y, z))
+     | _ => Error.bug "impossible"
+in
+   val acos = make1 (PR32.Math.acos, PR64.Math.acos)
+   val asin = make1 (PR32.Math.asin, PR64.Math.asin)
+   val atan = make1 (PR32.Math.atan, PR64.Math.atan)
+   val atan2 = make2 (PR32.Math.atan2, PR64.Math.atan2)
+   val cos = make1 (PR32.Math.cos, PR64.Math.cos)
+   val exp = make1 (PR32.Math.exp, PR64.Math.exp)
+   val ln = make1 (PR32.Math.ln, PR64.Math.ln)
+   val log10 = make1 (PR32.Math.log10, PR64.Math.log10)
+   val sin = make1 (PR32.Math.sin, PR64.Math.sin)
+   val sqrt = make1 (PR32.Math.sqrt, PR64.Math.sqrt)
+   val tan = make1 (PR32.Math.tan, PR64.Math.tan)
+
+   val add = make2 (PR32.+, PR64.+)
+   val op div = make2 (PR32./, PR64./)
+   val mul = make2 (PR32.*, PR64.* )
+   val sub = make2 (PR32.-, PR64.-)
+
+   val muladd = make3 (PR32.*+, PR64.*+)
+   val mulsub = make3 (PR32.*-, PR64.*-)
+
+   fun fromIntInf (i, s) =
+     case s of
+        R32 => doit r32 (Real32.fromIntInf, i)
+      | R64 => doit r64 (Real64.fromIntInf, i)
+end
+
+local
+   fun make (o32, o64) =
+    fn (Real32 r1, Real32 r2) => o32 (r1, r2)
+     | (Real64 r1, Real64 r2) => o64 (r1, r2)
+     | _ => Error.bug "impossible"
+in
+   val equal = make (PR32.==, PR64.==)
+   val le = make (PR32.<=, PR64.<=)
+   val lt = make (PR32.<, PR64.<)
+   val qequal = make (PR32.?=, PR64.?=)
+end
+
+datatype decon =
+   NAN
+ | ZERO of {signBit: bool}
+ | ONE of {signBit: bool}
+ | POW2 of {signBit: bool, exp: Int.t} (* man = 0.5 *)
+ | FIN of {signBit: bool, exp: Int.t, man: t}
+ | INF of {signBit: bool}
+
+local
+   fun doit (R {zero, half, one, inf, abs, signBit, isNan, toManExp,
+                compareReal, tag, ...})
+            value =
+       if isNan value
+          then NAN
+       else let
+             val signBit = signBit value
+             val absValue = abs value
+          in
+             if PIR.EQUAL = compareReal (zero, absValue)
+                then ZERO {signBit = signBit}
+             else if PIR.EQUAL = compareReal (one, absValue)
+                then ONE {signBit = signBit}
+             else if PIR.EQUAL = compareReal (inf, absValue)
+                then INF {signBit = signBit}
+             else let
+                   val {man, exp} = toManExp absValue
+                in
+                   if PIR.EQUAL = compareReal (half, man)
+                      then POW2 {signBit = signBit, exp = exp}
+                   else FIN {signBit = signBit, exp = exp, man = tag man}
+                end
+          end
+in
+   val decon =
+    fn Real32 x => doit r32 x
+     | Real64 x => doit r64 x
+end
+
+local
+   fun doit (R {bits, toBytes, subVec, ...}) x =
+       WordX.fromIntInf
+          (P.LargeWord.toLargeInt (subVec (toBytes x, 0)),
+           WordX.WordSize.fromBits bits)
+in
+   val castToWord =
+    fn Real32 x => doit r32 x
+     | Real64 x => doit r64 x
+end
+
+local
+   fun doit (R {bits, update, subArr, tag, ...}) w = let
+      val a = P.Word8Array.array (Bytes.toInt (Bits.toBytes bits), 0w0)
+   in
+      update (a, 0, P.LargeWord.fromLargeInt (WordX.toIntInf w))
+    ; tag (subArr (a, 0))
+   end
+in
+   fun castFromWord w =
+      if WordX.WordSize.bits (WordX.size w) = Bits.inWord32 then
+         doit r32 w
+      else if WordX.WordSize.bits (WordX.size w) = Bits.inWord64 then
+         doit r64 w
+      else
+         Error.bug "Invalid word size"
+end
+
+end
Index: mlton/atoms/sources.cm
===================================================================
--- mlton/atoms/sources.cm (revision 6625)
+++ mlton/atoms/sources.cm (working copy)
@@ -52,10 +52,10 @@
 (* Windows doesn't like files named con, so use con- instead. *)
 con-.sig
 con-.fun
+word-x.sig
+word-x.fun
 real-x.sig
 real-x.fun
-word-x.sig
-word-x.fun
 word-x-vector.sig
 word-x-vector.fun
 c-type.sig
Index: mlton/atoms/sources.mlb
===================================================================
--- mlton/atoms/sources.mlb (revision 6625)
+++ mlton/atoms/sources.mlb (working copy)
@@ -16,10 +16,10 @@
    (* Windows doesn't like files named con, so use con- instead. *)
    con-.sig
    con-.fun
+   word-x.sig
+   word-x.fun
    real-x.sig
    real-x.fun
-   word-x.sig
-   word-x.fun
    word-x-vector.sig
    word-x-vector.fun
    c-type.sig
Index: mlton/atoms/prim.fun
===================================================================
--- mlton/atoms/prim.fun (revision 6625)
+++ mlton/atoms/prim.fun (working copy)
@@ -21,6 +21,7 @@
 local
    open Const
 in
+   structure RealX = RealX
    structure WordX = WordX
    structure WordXVector = WordXVector
 end
@@ -1277,6 +1278,12 @@
          else ApplyResult.Const (Const.intInf ii)
       val intInfConst = intInf o IntInf.fromInt
       val null = ApplyResult.Const Const.null
+      fun real (r: RealX.t): ('a, 'b) ApplyResult.t =
+         ApplyResult.Const (Const.real r)
+      fun realNeg (s, x): ('a, 'b) ApplyResult.t =
+          ApplyResult.Apply (Real_neg s, [x])
+      fun realOpt (ro: RealX.t option): ('a, 'b) ApplyResult.t =
+         case ro of NONE => ApplyResult.Unknown | SOME r => real r
       fun word (w: WordX.t): ('a, 'b) ApplyResult.t =
          ApplyResult.Const (Const.word w)
       val f = ApplyResult.falsee
@@ -1394,6 +1401,38 @@
                  seqIndexConst (IntInf.fromInt (WordXVector.length v))
            | (Vector_sub, [WordVector v, Word i]) =>
                  word (WordXVector.sub (v, WordX.toInt i))
+           | (Real_neg _, [Real r]) => real (RealX.neg r)
+           | (Real_abs _, [Real r]) => real (RealX.abs r)
+           | (Real_Math_acos _, [Real r]) => realOpt (RealX.acos r)
+           | (Real_Math_asin _, [Real r]) => realOpt (RealX.asin r)
+           | (Real_Math_atan _, [Real r]) => realOpt (RealX.atan r)
+           | (Real_Math_atan2 _, [Real r1, Real r2]) =>
+                realOpt (RealX.atan2 (r1, r2))
+           | (Real_Math_cos _, [Real r]) => realOpt (RealX.cos r)
+           | (Real_Math_exp _, [Real r]) => realOpt (RealX.exp r)
+           | (Real_Math_ln _, [Real r]) => realOpt (RealX.ln r)
+           | (Real_Math_log10 _, [Real r]) => realOpt (RealX.log10 r)
+           | (Real_Math_sin _, [Real r]) => realOpt (RealX.sin r)
+           | (Real_Math_sqrt _, [Real r]) => realOpt (RealX.sqrt r)
+           | (Real_Math_tan _, [Real r]) => realOpt (RealX.tan r)
+           | (Real_add _, [Real r1, Real r2]) => realOpt (RealX.add (r1, r2))
+           | (Real_div _, [Real r1, Real r2]) => realOpt (RealX.div (r1, r2))
+           | (Real_mul _, [Real r1, Real r2]) => realOpt (RealX.mul (r1, r2))
+           | (Real_sub _, [Real r1, Real r2]) => realOpt (RealX.sub (r1, r2))
+           | (Real_muladd _, [Real r1, Real r2, Real r3]) =>
+                realOpt (RealX.muladd (r1, r2, r3))
+           | (Real_mulsub _, [Real r1, Real r2, Real r3]) =>
+                realOpt (RealX.mulsub (r1, r2, r3))
+           | (Real_equal _, [Real r1, Real r2]) => bool (RealX.equal (r1, r2))
+           | (Real_le _, [Real r1, Real r2]) => bool (RealX.le (r1, r2))
+           | (Real_lt _, [Real r1, Real r2]) => bool (RealX.lt (r1, r2))
+           | (Real_qequal _, [Real r1, Real r2]) => bool (RealX.qequal (r1, r2))
+           | (Real_castToWord _, [Real r]) => word (RealX.castToWord r)
+           | (Word_castToReal _, [Word w]) => real (RealX.castFromWord w)
+           | (Word_rndToReal (_, s, {signed}), [Word w]) =>
+                realOpt
+                (RealX.fromIntInf
+                 (if signed then WordX.toIntInfX w else WordX.toIntInf w, s))
            | (Word_add _, [Word w1, Word w2]) => word (WordX.add (w1, w2))
            | (Word_addCheck s, [Word w1, Word w2]) => wcheck (op +, s, w1, w2)
            | (Word_andb _, [Word w1, Word w2]) => word (WordX.andb (w1, w2))
@@ -1490,6 +1529,38 @@
                                     else Unknown
                    | _ => Unknown
                end handle Exn.Overflow => Unknown
+            fun varReal (x, r, inOrder) =
+               let
+                  datatype z = datatype RealX.decon
+                  datatype z = datatype ApplyResult.t
+                  fun negIf (s, signBit) =
+                      if signBit then realNeg (s, x) else Var x
+               in
+                  case RealX.decon r of
+                     ZERO _ => Unknown
+                   | ONE {signBit} =>
+                     (case p of
+                         Real_mul s => negIf (s, signBit)
+                       | Real_div s => if inOrder
+                                          then negIf (s, signBit)
+                                       else Unknown
+                       | _ => Unknown)
+                   | NAN =>
+                     (case p of
+                         Real_Math_atan2 _ => real r
+                       | Real_add _ => real r
+                       | Real_div _ => real r
+                       | Real_mul _ => real r
+                       | Real_sub _ => real r
+                       | Real_equal _ => bool false
+                       | Real_qequal _ => bool true
+                       | Real_le _ => bool false
+                       | Real_lt _ => bool false
+                       | _ => Unknown)
+                   | POW2 _ => Unknown
+                   | INF _ => Unknown
+                   | FIN _ => Unknown
+               end
             fun varWord (x, w, inOrder) =
                let
                   val zero = word o WordX.zero
@@ -1627,6 +1698,8 @@
                                   else bool true
                           else bool false
                   else Unknown
+             | (_, [Var x, Const (Real r)]) => varReal (x, r, true)
+             | (_, [Const (Real r), Var x]) => varReal (x, r, false)
              | (_, [Var x, Const (Word i)]) => varWord (x, i, true)
              | (_, [Const (Word i), Var x]) => varWord (x, i, false)
              | (_, [Const (IntInf i1), Const (IntInf i2), _]) =>
Index: mlton/atoms/const.sig
===================================================================
--- mlton/atoms/const.sig (revision 6625)
+++ mlton/atoms/const.sig (working copy)
@@ -13,7 +13,7 @@
       structure RealX: REAL_X
       structure WordX: WORD_X
       structure WordXVector: WORD_X_VECTOR
-      sharing WordX = WordXVector.WordX
+      sharing WordX = RealX.WordX = WordXVector.WordX
    end
 
 signature CONST =
Index: mlton/codegen/c-codegen/c-codegen.fun
===================================================================
--- mlton/codegen/c-codegen/c-codegen.fun (revision 6625)
+++ mlton/codegen/c-codegen/c-codegen.fun (working copy)
@@ -45,12 +45,19 @@
 
       fun toC (r: t): string =
          let
-            (* The only difference between SML reals and C floats/doubles is that
+            (* The main difference between SML reals and C floats/doubles is that
              * SML uses "~" while C uses "-".
              *)
             val s =
                String.translate (toString r,
                                  fn #"~" => "-" | c => String.fromChar c)
+            (* Also, inf is spelled INFINITY and nan is NAN in C. *)
+            val s =
+               case s of
+                  "-inf" => "-INFINITY"
+                | "inf"  => "INFINITY"
+                | "nan"  => "NAN"
+                | other  => other
          in
             case size r of
                R32 => concat ["(Real32)", s]
Index: basis-library/real/real.sml
===================================================================
--- basis-library/real/real.sml (revision 6625)
+++ basis-library/real/real.sml (working copy)
@@ -116,10 +116,7 @@
              | _ => if signBit x then ~x else x
 
       fun isFinite r =
-         case class r of
-            INF => false
-          | NAN => false
-          | _ => true
+         abs r <= maxFinite
 
       val op == = Prim.==
 
@@ -153,10 +150,10 @@
          else y
 
       fun sign (x: real): int =
-         case class x of
-            NAN => raise Domain
-          | ZERO => 0
-          | _ => if x > zero then 1 else ~1
+         if x > zero then 1
+         else if x < zero then ~1
+         else if x == zero then 0
+         else raise Domain
 
       fun sameSign (x, y) = signBit x = signBit y
 
@@ -266,10 +263,9 @@
       val realMod = #frac o split
 
       fun checkFloat x =
-         case class x of
-            INF => raise Overflow
-          | NAN => raise Div
-          | _ => x
+         if isFinite x then x
+         else if isNan x then raise Div
+         else raise Overflow
 
       fun roundReal (x: real, m: rounding_mode): real =
          IEEEReal.withRoundingMode (m, fn () => R.round x)
@@ -623,61 +619,72 @@
                                minInt': 'a,
                                precision': int}} =
             (fromIntUnsafe,
-             if Int.< (precision, #precision' other)
-                then let
-                        val maxInt' = #maxInt' other
-                        val minInt' = #minInt' other
-                        (* maxInt can't be represented exactly. *)
-                        (* minInt can be represented exactly. *)
-                        val (maxInt,minInt) =
-                           IEEEReal.withRoundingMode
-                           (TO_ZERO, fn () => (fromIntUnsafe maxInt',
-                                               fromIntUnsafe minInt'))
-                     in
-                        fn (m: rounding_mode) => fn x =>
-                        case class x of
-                           INF => raise Overflow
-                         | NAN => raise Domain
-                         | _ => if minInt <= x andalso x <= maxInt
-                                   then toIntUnsafe (roundReal (x, m))
-                                else raise Overflow
-                     end
-             else let
-                     val maxInt' = #maxInt' other
-                     val minInt' = #minInt' other
-                     val maxInt = fromIntUnsafe maxInt'
-                     val minInt = fromIntUnsafe minInt'
-                  in
-                     fn (m: rounding_mode) => fn x =>
-                     case class x of
-                        INF => raise Overflow
-                      | NAN => raise Domain
-                      | _ => if minInt <= x
-                                then if x <= maxInt
-                                        then toIntUnsafe (roundReal (x, m))
-                             else if x < maxInt + one
-                                then (case m of
-                                         TO_NEGINF => maxInt'
-                                       | TO_POSINF => raise Overflow
-                                       | TO_ZERO => maxInt'
-                                       | TO_NEAREST =>
-                                            (* Depends on maxInt being odd. *)
-                                            if x - maxInt >= half
-                                               then raise Overflow
-                                            else maxInt')
-                                else raise Overflow
-                             else if x > minInt - one
-                                then (case m of
-                                         TO_NEGINF => raise Overflow
-                                       | TO_POSINF => minInt'
-                                       | TO_ZERO => minInt'
-                                       | TO_NEAREST =>
-                                            (* Depends on minInt being even. *)
-                                            if x - minInt < ~half
-                                               then raise Overflow
-                                            else minInt')
-                             else raise Overflow
-                  end)
+             if Int.< (precision, #precision' other) then
+                let
+                   val maxInt' = #maxInt' other
+                   val minInt' = #minInt' other
+                   (* maxInt can't be represented exactly. *)
+                   (* minInt can be represented exactly. *)
+                   val (maxInt,minInt) =
+                       IEEEReal.withRoundingMode
+                       (TO_ZERO, fn () => (fromIntUnsafe maxInt',
+                                           fromIntUnsafe minInt'))
+                in
+                   fn (m: rounding_mode) => fn x =>
+                   if minInt <= x then
+                      if x <= maxInt then
+                         toIntUnsafe (roundReal (x, m))
+                      else
+                         raise Overflow
+                   else
+                      if x < minInt then
+                         raise Overflow
+                      else
+                         raise Domain (* NaN *)
+                end
+             else
+                let
+                   val maxInt' = #maxInt' other
+                   val minInt' = #minInt' other
+                   val maxInt = fromIntUnsafe maxInt'
+                   val minInt = fromIntUnsafe minInt'
+                in
+                   fn (m: rounding_mode) => fn x =>
+                   if minInt <= x then
+                      if x <= maxInt then
+                         toIntUnsafe (roundReal (x, m))
+                      else
+                         if x < maxInt + one then
+                            (case m of
+                                TO_NEGINF => maxInt'
+                              | TO_POSINF => raise Overflow
+                              | TO_ZERO => maxInt'
+                              | TO_NEAREST =>
+                                (* Depends on maxInt being odd. *)
+                                if x - maxInt >= half then
+                                   raise Overflow
+                                else
+                                   maxInt')
+                         else
+                            raise Overflow
+                   else
+                      if x < minInt then
+                         if minInt - one < x then
+                            (case m of
+                                TO_NEGINF => raise Overflow
+                              | TO_POSINF => minInt'
+                              | TO_ZERO => minInt'
+                              | TO_NEAREST =>
+                                (* Depends on minInt being even. *)
+                                if x - minInt < ~half then
+                                   raise Overflow
+                                else
+                                   minInt')
+                         else
+                            raise Overflow
+                      else
+                         raise Domain (* NaN *)
+                end)
       in
          val (fromInt8,toInt8) =
             make {fromIntUnsafe = R.fromInt8Unsafe,
Index: lib/mlton/pervasive/pervasive.sml
===================================================================
--- lib/mlton/pervasive/pervasive.sml (revision 6625)
+++ lib/mlton/pervasive/pervasive.sml (working copy)
@@ -31,6 +31,10 @@
       structure Math = Math
       structure Option = Option
       structure OS = OS
+      structure PackReal32Little = PackReal32Little
+      structure PackReal64Little = PackReal64Little
+      structure PackWord32Little = PackWord32Little
+      structure PackWord64Little = PackWord64Little
       structure Position = Position
       structure Posix = Posix
       structure Real = Real
@@ -47,9 +51,11 @@
       structure Vector = Vector
       structure Word = Word
       structure Word32 = Word32
+      structure Word64 = Word64
       structure Word8 = Word8
       structure Word16 = Word16
       structure Word8Array = Word8Array
+      structure Word8Vector = Word8Vector
 
       type unit = General.unit
       type int = Int.int


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Constant Folding of FP Operations

by Matthew Fluet-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 23 May 2008, Vesa Karvonen wrote:

> Attached is an experimental patch that improves MLton's constant folding
> of floating point operations.  The problem with constant folding FP
> operations in SML is that the FP ops are subject to rounding mode
> settings:
>
>  http://www.standardml.org/Basis/ieee-float.html#SIG:IEEE_REAL.setRoundingMode:VAL
>
> The workaround used in the patch is to evaluate the operations in all
> rounding modes (actually in only TO_NEGINF and TO_POSINF) and check that
> the results agree.  This ensures that constant folding is correct in all
> rounding modes.

Seems like a good optimization, and a sound one.  Did you run the
benchmarks and observe any speedups?

> Personally I'm inclined to think that the inclusion of get/setRoundingMode
> in the Basis library is a design mistake, because it makes some FP
> optimizations very difficult.

FP optimizations seem to be very difficult in general.  It is particularly
difficult because IEEE 754 says nothing about many floating-point
operations; for example, sin/cos/tan are not specified to follow an
implicit or explicit rounding mode.

> An alternative design would be to have
> multiple modules, say Real[ToNearest], RealToNegInf, RealToPosInf, and
> RealToZero, each corresponding to a specific rounding mode.  FP ops could
> then be treated as functional and it would then likely become
> straightforward to make the same kind of aggressive constant folding and
> simplification of FP ops as is done for integer ops.  Of course, this
> would introduce the need for the compiler to insert code to switch the
> rounding modes when necessary, but this doesn't seem extremely difficult.

The C-- specification of floating-point expressions makes the rounding
mode an explicit argument; e.g.:

   fsub  \forall \alpha. bits \alpha * bits \alpha * bits 2 -> bits \alpha
         Floating-point subtract with explicit rounding mode.
   round_down  bits2
         IEEE 754 rounding mode for rounding down.

The C-- specification of floating-point expressions also does not include
any operations not specified by the IEEE 754 standard; that is, C-- does
not provide trancendentals, etc.  Instead, one must call the system's math
library.

> About the patch.  The compiler changes do not compile with SML/NJ, which
> does not support 32-bit floating point (Real32), and it should be made so
> that the FP simplifications aren't performed (and would be replaced with
> dummy functions) when the compiler has been compiled with SML/NJ.

You might consider dynamically checking the value of Real32.precision,
which would allow you to distinguish a "real" Real32 from a "dummy" Real32
that is bound to Real64.

> This is
> the reason why I consider this an experimental patch and haven't committed
> it.  Some of the floating point simplifications should probably also be
> disabled when cross compiling.  These should not be very difficult to do,
> but it seems that I'm unlikely get around to implementing them.

There also seems to be a little-endian assumption for constant-folding
castToWord and castFromWord.  It should be easy enough to use:

val r64 =
     R {...
        subVec = fn arg =>
                 if Control.Target.bigEndian
                 then P.PackWord64Big.subVec arg
                 else P.PackWord64Little.subVec arg,
        ...}

> The "decon" datatype has some constructors that aren't currently used
> (although they are produced by the decon function) in the patch.  I
> planned those constructors for some optimizations (e.g. replace divide by
> a power of 2 with a multiply) that aren't currently possible with MLton's
> Prim.apply, because it does not allow the introduction of an operation
> with constants in the result of Prim.apply.  This restriction of
> Prim.apply could probably be lifted without huge problems.  The limitation
> is probably due to the design of the SSA and SSA2 intermediate languages
> of MLton that do not allow constants as operands.

It shouldn't be difficult to allow Prim.apply to return more complicated
expressions.  There aren't that many (static) uses of Prim.apply in the
compiler; you would simply need to allow uses of Prim.apply to yield a
sequence of IL statements, so that you could bind constants to variables
before using them in the IL primitive.

With the exception of the caveats noted above (SML/NJ builds,
cross-compiling, and target endiannes), I think the patch looks good.

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Constant Folding of FP Operations

by Matthew Fluet-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, 1 Jun 2008, Matthew Fluet wrote:

> On Fri, 23 May 2008, Vesa Karvonen wrote:
>>  Attached is an experimental patch that improves MLton's constant folding
>>  of floating point operations.  The problem with constant folding FP
>>  operations in SML is that the FP ops are subject to rounding mode
>>  settings:
>>
>>   http://www.standardml.org/Basis/ieee-float.html#SIG:IEEE_REAL.setRoundingMode:VAL
>>
>>  The workaround used in the patch is to evaluate the operations in all
>>  rounding modes (actually in only TO_NEGINF and TO_POSINF) and check that
>>  the results agree.  This ensures that constant folding is correct in all
>>  rounding modes.
>
> Seems like a good optimization, and a sound one.  Did you run the benchmarks
> and observe any speedups?

On amd64-linux, I get the following:

MLton0 -- ~/devel/mlton/mlton.svn.trunk/build/bin/mlton -codegen amd64
MLton1 -- ~/devel/mlton/mlton.svn.trunk/build/bin/mlton -codegen c
MLton2 -- ~/devel/mlton/mlton.svn.trunk/build.real-cf/bin/mlton -codegen amd64
MLton3 -- ~/devel/mlton/mlton.svn.trunk/build.real-cf/bin/mlton -codegen c
run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3
barnes-hut          1.00   1.04   0.84   0.88
boyer               1.00   1.12   1.03   1.12
checksum            1.00   5.72   1.01   5.72
count-graphs        1.00   0.92   1.01   0.92
DLXSimulator        1.00   1.08   0.98   1.06
fft                 1.00   1.05   1.00   1.06
fib                 1.00   1.40   1.04   1.40
flat-array          1.00   1.60   0.99   1.61
hamlet              1.00   1.65   1.00   1.68
imp-for             1.00   1.46   0.99   1.47
knuth-bendix        1.00   1.36   1.00   1.36
lexgen              1.00   1.05   1.00   1.04
life                1.00   1.00   1.00   1.00
logic               1.00   1.08   1.00   1.09
mandelbrot          1.00   1.36   0.96   1.35
matrix-multiply     1.00   0.91   1.00   0.91
md5                 1.00   6.94   1.00   6.94
merge               1.00   1.05   1.00   1.05
mlyacc              1.00   1.08   1.00   1.08
model-elimination   1.00   1.25   0.99   1.26
mpuz                1.00   1.93   1.00   1.93
nucleic             1.00   1.00   0.99   0.99
output1             1.00   1.23   1.00   1.24
peek                1.00   0.91   1.05   0.92
psdes-random        1.00   0.83   0.99   0.83
ratio-regions       1.00   1.21   1.00   1.33
ray                 1.00   1.06   0.94   1.05
raytrace            1.00   1.13   0.96   1.05
simple              1.00   1.40   1.06   1.42
smith-normal-form   1.00   1.01   1.01   1.01
tailfib             1.00   1.87   1.06   1.87
tak                 1.00   1.14   1.00   1.13
tensor              1.00   1.57   1.01   1.57
tsp                 1.00   1.01   1.00   1.01
tyan                1.00   1.18   1.02   1.18
vector-concat       1.00   1.08   1.00   1.07
vector-rev          1.00   1.40   1.00   1.47
vliw                1.00   1.35   1.01   1.38
wc-input1           1.00   1.00   1.00   1.00
wc-scanStream       1.00   1.21   0.99   1.21
zebra               1.00   0.79   1.00   0.79
zern                1.00   1.62   1.00   1.64

This seems to only show a non-noise speedup on barnes-hut (and maybe ray).

I'd really like to know a good way of cutting down the noise in the
benchmarks; consider that fib and tailfib (which use no FP operations, and
so yield identical assembly code) show a 1.04 and 1.06 slowdown,
respectively.

The rest of the benchmark data follows:

size
benchmark            MLton0    MLton1    MLton2    MLton3
barnes-hut          165,614   170,237   167,663   172,286
boyer               218,529   220,105   218,529   220,105
checksum             98,257   105,473    98,257   105,473
count-graphs        124,401   127,073   124,401   127,073
DLXSimulator        201,324   210,004   201,324   210,004
fft                 120,687   127,772   120,655   127,708
fib                  98,225    97,321    98,225    97,321
flat-array           97,681    96,913    97,681    96,913
hamlet            1,509,177 1,542,601 1,508,809 1,545,585
imp-for              97,969    97,105    97,969    97,105
knuth-bendix        177,004   186,044   177,004   186,044
lexgen              291,003   318,683   291,003   318,683
life                122,257   118,777   122,257   118,777
logic               182,497   182,665   182,497   182,665
mandelbrot           97,857   100,545    97,841   100,529
matrix-multiply      99,969   102,225    99,969   102,225
md5                 132,252   142,588   132,252   142,588
merge                99,601   106,417    99,601   106,417
mlyacc              663,259   704,187   663,259   704,187
model-elimination   865,986   953,682   866,002   953,666
mpuz                104,241   112,273   104,241   112,273
nucleic             273,760   256,196   273,760   256,196
output1             141,056   148,688   141,056   148,688
peek                137,804   143,212   137,804   143,212
psdes-random        101,169    99,665   101,169    99,665
ratio-regions       125,649   135,905   125,649   135,905
ray                 249,400   257,839   248,728   258,223
raytrace            378,114   397,078   374,530   392,054
simple              347,593   377,012   346,777   376,548
smith-normal-form   276,332   292,884   276,332   292,884
tailfib              97,713    96,897    97,713    96,897
tak                  98,273    97,273    98,273    97,273
tensor              167,507   174,971   167,507   174,971
tsp                 144,827   151,658   144,363   151,354
tyan                217,644   229,268   217,644   229,268
vector-concat        99,617    98,457    99,617    98,457
vector-rev           99,217    98,281    99,217    98,281
vliw                528,426   616,042   526,874   614,490
wc-input1           164,522   169,826   164,522   169,826
wc-scanStream       175,258   184,914   175,258   184,914
zebra               217,196   219,948   217,196   219,948
zern                135,302   140,707   135,318   140,403
compile time
benchmark         MLton0 MLton1 MLton2 MLton3
barnes-hut          9.78  12.15  10.89  12.87
boyer              10.80  23.17  10.32  22.40
checksum            7.70   7.94   7.66   8.25
count-graphs        8.29   9.43   8.93   9.58
DLXSimulator       10.81  15.09  10.92  15.61
fft                 8.19   9.04   8.53   9.21
fib                 7.74   7.89   8.22   8.56
flat-array          7.56   7.58   8.18   8.13
hamlet             44.43 111.25  48.09 112.88
imp-for             7.75   7.79   7.71   7.96
knuth-bendix        9.51  13.01   9.86  13.41
lexgen             12.25  18.86  12.61  19.02
life                8.31   9.84   8.27   9.29
logic              10.09  13.89   9.90  14.41
mandelbrot          7.77   7.80   8.04   8.39
matrix-multiply     8.33   8.52   7.82   8.13
md5                 8.64  10.02   8.74  10.50
merge               7.60   7.82   7.86   8.20
mlyacc             27.15  44.67  26.66  44.90
model-elimination  25.60  57.05  25.57  57.73
mpuz                7.92   8.23   8.05   8.34
nucleic            11.75  23.32  11.74  23.16
output1             8.57  10.34   8.72  10.81
peek                8.45  10.05   8.94  10.31
psdes-random        7.68   8.16   8.04   8.34
ratio-regions       8.97  10.33   9.42  10.98
ray                11.82  17.10  11.64  17.38
raytrace           15.24  26.74  14.96  26.36
simple             13.48  22.12  13.04  21.70
smith-normal-form  11.55  54.59  12.06  52.01
tailfib             7.63   7.94   7.85   8.10
tak                 7.64   8.04   7.88   8.17
tensor             10.38  13.41  10.48  13.56
tsp                 9.25  10.70   9.55  11.00
tyan               10.77  15.96  10.89  16.04
vector-concat       7.44   7.76   7.92   8.20
vector-rev          7.59   7.80   8.10   7.89
vliw               19.42  35.39  19.30  35.81
wc-input1           9.85  12.02  10.13  12.06
wc-scanStream       9.60  12.95   9.90  12.64
zebra              11.04  15.44  11.02  15.17
zern                8.28   9.75   8.85   9.81
run time
benchmark         MLton0 MLton1 MLton2 MLton3
barnes-hut         18.06  18.74  15.15  15.94
boyer              53.59  59.96  55.25  60.11
checksum           18.73 107.23  18.83 107.20
count-graphs       26.00  24.04  26.24  24.05
DLXSimulator       28.01  30.34  27.39  29.75
fft                14.61  15.34  14.54  15.45
fib                37.68  52.67  39.24  52.68
flat-array         29.27  46.93  29.02  47.00
hamlet             50.41  82.93  50.26  84.74
imp-for            26.97  39.41  26.73  39.56
knuth-bendix       24.96  34.07  24.90  34.01
lexgen             22.16  23.22  22.11  23.01
life               26.40  26.39  26.45  26.39
logic              23.22  25.07  23.31  25.41
mandelbrot         21.64  29.39  20.71  29.21
matrix-multiply    35.93  32.61  35.81  32.74
md5                33.22 230.39  33.24 230.60
merge              48.85  51.19  49.00  51.41
mlyacc             26.21  28.28  26.19  28.23
model-elimination  38.80  48.56  38.28  48.76
mpuz               23.48  45.41  23.44  45.35
nucleic            18.37  18.41  18.18  18.25
output1            37.22  45.94  37.34  46.09
peek               21.91  19.99  23.06  20.05
psdes-random       16.05  13.26  15.88  13.28
ratio-regions     124.21 150.12 124.00 164.76
ray                14.96  15.84  14.09  15.66
raytrace           17.23  19.49  16.49  18.15
simple             27.70  38.85  29.36  39.24
smith-normal-form   8.48   8.59   8.59   8.59
tailfib            22.48  41.96  23.79  41.95
tak                32.33  36.78  32.45  36.44
tensor             22.72  35.64  22.98  35.59
tsp                25.33  25.69  25.28  25.59
tyan               27.55  32.38  28.06  32.51
vector-concat      27.96  30.27  27.97  29.98
vector-rev         37.32  52.38  37.26  54.74
vliw               23.96  32.38  24.31  33.13
wc-input1          34.72  34.87  34.85  34.61
wc-scanStream      28.70  34.74  28.31  34.84
zebra              30.52  24.01  30.54  24.12
zern               22.54  36.43  22.59  36.90

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Constant Folding of FP Operations

by Vesa Karvonen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, Jun 1, 2008 at 10:29 PM, Matthew Fluet <fluet@...> wrote:
> On Fri, 23 May 2008, Vesa Karvonen wrote:
[...]
> Seems like a good optimization, and a sound one.  Did you run the
> benchmarks and observe any speedups?

No, I don't think I ran the benchmarks with this optimization, but I see
that you did.  However, I did try the optimization with a slightly
modified version (note the parentheses in "store + (...)"):

  fun eval_real_poly n = let
     val msg = n div 10
     val x1 = 4.0
     val x2 = x1
     val x3 = x1
     val x4 = x1
     val x5 = x1
     val x6 = x1
     fun eval (0,store) = store
       | eval (n,store) =
         let in
            if n mod msg = 0 then print ("iter: " ^ Int.toString n ^
                                         "\n") else ();
            eval (n-1,
                  store + ((~x1)*x4 + x2*x5 +(~x3)*x5 +(~x2)*x6 + x3*x6 +
                           x4*(x2 + x3 + x5 + x6) + x4*(~x1)+x4*(~x4)))
         end
  in
     eval (n,0.0)
  end

  val _ = print (Real.toString (eval_real_poly 500000000) ^ "\n")

of Sean McLauglin's example:

  http://mlton.org/pipermail/mlton-user/2007-December/001294.html

After changing the SML expression to match the C expression (as shown
above), MLton performed constant folding on the right-hand side expression
and performance was much closer to C.

> On amd64-linux, I get the following:
>
> MLton0 -- ~/devel/mlton/mlton.svn.trunk/build/bin/mlton -codegen amd64
> MLton1 -- ~/devel/mlton/mlton.svn.trunk/build/bin/mlton -codegen c
> MLton2 -- ~/devel/mlton/mlton.svn.trunk/build.real-cf/bin/mlton -codegen
> amd64
> MLton3 -- ~/devel/mlton/mlton.svn.trunk/build.real-cf/bin/mlton -codegen c
> run time ratio
> benchmark         MLton0 MLton1 MLton2 MLton3
> barnes-hut          1.00   1.04   0.84   0.88
[...]
> ray                 1.00   1.06   0.94   1.05
[...]
>
> This seems to only show a non-noise speedup on barnes-hut

That would match my expectation that this optimization applies only
rarely.  I'm actually surprised to see such a large improvement on
barnes-hut.

> (and maybe ray).

Probably.  As we know, gcc already performs constant folding of FP, so I
would not be surprised if the optimization would not necessarily improve
performance with the C backend (as seems to be that case with ray).

> I'd really like to know a good way of cutting down the noise in the
> benchmarks; consider that fib and tailfib (which use no FP operations,
> and so yield identical assembly code) show a 1.04 and 1.06 slowdown,
> respectively.

How are the times measured?  Looking very briefly at benchmark/main.sml
(line 79) it would seem to use an average of times over a 60 second
sampling period.  Using the average probably does not give the best
results.  For a deterministic benchmark it should be best to run the
benchmark multiple times and only take the lowest time.  Any time above
the lowest possible time for a deterministic benchmark should be noise
from the system (switches to other processes trashing caches, etc.).  So,
by taking an average, you are actually letting the noise enter the
measurement.  (This was mentioned in the paper on dominators

  http://www.cs.rice.edu/~keith/EMBED/dom.pdf

based on which I implemented the new dominators algorithm for MLton a
while back.)

> The C-- specification of floating-point expressions makes the rounding mode
> an explicit argument; e.g.:

That sounds reasonable for an intermediate (or target) language.

>> About the patch.  The compiler changes do not compile with SML/NJ,
>> which does not support 32-bit floating point (Real32), [...]
>
> You might consider dynamically checking the value of Real32.precision,
> which would allow you to distinguish a "real" Real32 from a "dummy"
> Real32 that is bound to Real64.

Hmm... That approach would probably make it reasonably easy to make it
compile with SML/NJ.  The approaches I had in mind were somewhat more
complicated.  I'll have to try this.

> There also seems to be a little-endian assumption for constant-folding
> castToWord and castFromWord.  [...]

Not exactly.  The assumption is that the endiannes of reals and words
agree in the sense that if you convert a real/word to little endian bytes
and then convert the little endian bytes to a word/real, you'll get the
desired cast.  The operations are (in pseudo code):

  castFromWord = RealFromLittle o WordToLittle
  castToWord = WordFromLittle o RealToLittle

That should be identical to:

  castFromWord = RealFromBig o WordToBig
  castToWord = WordFromBig o RealToBig

(I think that the Pack structures are also missing from SML/NJ, so they
have to be faked as well.)

> It shouldn't be difficult to allow Prim.apply to return more complicated
> expressions.  There aren't that many (static) uses of Prim.apply in the
> compiler; you would simply need to allow uses of Prim.apply to yield a
> sequence of IL statements, so that you could bind constants to variables
> before using them in the IL primitive.

Right.  I was thinking about returning an expression (tree), but a
sequence of expressions probably fits the ILs better.  However, a
difficulty with the sequence of statements approach would seem to be that
Prim.apply is polymorphic in the type of variables, so something needs to
be done to make it possible to effectively create new variables in
Prim.apply.

-Vesa Karvonen

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Constant Folding of FP Operations

by Matthew Fluet-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2 Jun 2008, Vesa Karvonen wrote:

> On Sun, Jun 1, 2008 at 10:29 PM, Matthew Fluet <fluet@...> wrote:
>> On Fri, 23 May 2008, Vesa Karvonen wrote:
> [...]
>> Seems like a good optimization, and a sound one.  Did you run the
>> benchmarks and observe any speedups?
>
> No, I don't think I ran the benchmarks with this optimization, but I see
> that you did.  However, I did try the optimization with a slightly
> modified version (note the parentheses in "store + (...)"):
>
> of Sean McLauglin's example:
>
>  http://mlton.org/pipermail/mlton-user/2007-December/001294.html
>
> After changing the SML expression to match the C expression (as shown
> above), MLton performed constant folding on the right-hand side expression
> and performance was much closer to C.

Good to see that the optimization kicked in, even if it is an artificial
example.

> Probably.  As we know, gcc already performs constant folding of FP, so I
> would not be surprised if the optimization would not necessarily improve
> performance with the C backend (as seems to be that case with ray).

Except that after considering Sean McLaughlin's example, we don't allow
gcc to constant fold any floating-point options (in the C-codegen):
   http://mlton.org/cgi-bin/viewsvn.cgi?view=rev&rev=5799
This is accomplished by making all real constants globals, and accessed
via an indirect load, so gcc can't see the constant value.

So, since the C-codegen didn't show any improvement with FP
constant-folding on the ray benchmark, I suspect that the improvement in
the native codegen was just system noise.

>> I'd really like to know a good way of cutting down the noise in the
>> benchmarks; consider that fib and tailfib (which use no FP operations,
>> and so yield identical assembly code) show a 1.04 and 1.06 slowdown,
>> respectively.
>
> How are the times measured?  Looking very briefly at benchmark/main.sml
> (line 79) it would seem to use an average of times over a 60 second
> sampling period.  Using the average probably does not give the best
> results.  For a deterministic benchmark it should be best to run the
> benchmark multiple times and only take the lowest time.  Any time above
> the lowest possible time for a deterministic benchmark should be noise
> from the system (switches to other processes trashing caches, etc.).  So,
> by taking an average, you are actually letting the noise enter the
> measurement.  (This was mentioned in the paper on dominators
>
>  http://www.cs.rice.edu/~keith/EMBED/dom.pdf
>
> based on which I implemented the new dominators algorithm for MLton a
> while back.)

That's an interesting view, to take the minimum time, rather than the
average time.

>>> About the patch.  The compiler changes do not compile with SML/NJ,
>>> which does not support 32-bit floating point (Real32), [...]
>>
>> You might consider dynamically checking the value of Real32.precision,
>> which would allow you to distinguish a "real" Real32 from a "dummy"
>> Real32 that is bound to Real64.
>
> Hmm... That approach would probably make it reasonably easy to make it
> compile with SML/NJ.  The approaches I had in mind were somewhat more
> complicated.  I'll have to try this.

In the long term, it would be good to avoid reliance on Real32 for any
purposes other than optimization.  For example, we could make
   datatype RealX.t = T of {size: RealSize.t,
                            value: IEEEReal.decimal_approx}
which mimics the treatment of WordX.t.

We currently require Real32.{fromString,isFinite} for elaborating real
constants, Real32.{==,signBit} for equality on real constants, and
Real32.format for emitting real constants.  If Real32 is bound to Real64,
then we might accept Real32.real floating-point constants that are finite
in Real64 but would not be finite in Real32.[*] If Real32 is bound to
Real64, then we might judge two floating-point constants to be unequal
when they are equal at Real32.real; this is benign, since we only use
equality of floating-point constants combine equal constants, so a
conservative equality is fine.  If Real32 is bound to Real64, then we
might emit more digits than necessary to represent the value; again, this
is benign.

[*] We currently use Real<N>.isFinite to give a compile-time error for
floating-point constants that do not denote finite values.  The argument
for this is that The Definition states:
   "it is a compile-time error if the constant does not make sense or does
   not denote a value within the machine representation chosen for the
   type."
Of course, one might argue that "1.0e123" denotes a specific 32-bit IEEE
754 value (namely, +inf) just as much as a "1.0e~1" does -- for neither
does the mathematical value of the IEEE value equal the mathematical value
of the string.

>> There also seems to be a little-endian assumption for constant-folding
>> castToWord and castFromWord.  [...]
>
> Not exactly.  The assumption is that the endiannes of reals and words
> agree in the sense that if you convert a real/word to little endian bytes
> and then convert the little endian bytes to a word/real, you'll get the
> desired cast.  The operations are (in pseudo code):
>
>  castFromWord = RealFromLittle o WordToLittle
>  castToWord = WordFromLittle o RealToLittle
>
> That should be identical to:
>
>  castFromWord = RealFromBig o WordToBig
>  castToWord = WordFromBig o RealToBig

Ah, I see.

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton