Ask HN: How is it possible to get -0.0 in a sum?

11 points by gus_massa 3 days ago

I'm looking for corner cases where he result is -0.0. As far as I know, the only way to get -0.0 in a sum is

  (-0.0) + (-0.0)
Does someone know any other case in IEEE 754?

Bonus question: What happens in subtractions? I only know

  (-0.0) - (+0.0)
Is there any other case?
sparkie 3 days ago

It depends on the FP rounding mode. If rounding mode is FE_TOWARDZERO/FE_UPWARD/FE_TONEAREST then the case you gave is the only one I'm aware of. If rounding mode is FE_DOWNWARD (towards negative infinity) then other calculations that result in a zero will give a -0.0.

Here's an example of -1.0f + 1.0f resulting in -0.0: https://godbolt.org/z/5qvqsdh9P

  • gus_massa 2 days ago

    Thanks! [Sorry for the delay.]

    ---

    FYI: For more context, I'm trying to send a PR to Chez Scheme (and indirectly to Racket) https://github.com/cisco/ChezScheme/pull/959 to reduce expressions like

      (+ 1.0 (length L))  ;  ==>  (+ 1.0 (fixnum->flonum (length L)))
    
    where the "fixnums" are small integers and "flonums" are double.

    It's fine, unless you have the case

      (+ -0.0 (length L))  ;  =wrong=>  (+ -0.0 (fixnum->flonum (length L)))
    
    because if the length is 0, it get's transformed into 0.0 instead of -0.0

    There are a few corner cases, in particular because it's possible to have

       (+ 1.0 x (length L))
    
    and I really want to avoid the runtime check of (length L) == 0 if possible.

    So I took a look, asked there, and now your opinion confirms what I got so far. My C is not very good, so it's nice to have a example of how the rounding directions are used. Luckily Chez Scheme only uses the default rounding and it's probably correct to cut a few corners. I'll take a looks for a few days in case there is some surprise.

    • sparkie a day ago

      I'm not sure you can avoid the check, but you can avoid a branch.

      An AVX-512 extension has a `vfixupimm` instruction[1] which can adjust special floating point values. You could use this to adjust all zeroes to -0 but leave any non-zeroes untouched. It isn't very obvious how to use though.

          vfixupimmsd dst, src, fixup, flag
      
       * The `flag` is for error reporting - we can set it to zero to ignore errors.
      
       * `dst` and `src` are a floating point value - they can be the same register.
      
       * The instruction first checks `src` and turns any denormals into zero if the MXCSR.DAZ flag is set.
      
       * It then categorizes `src` as one of {QNAN, SNAN, ZERO, ONE, NEG_INF, POS_ING, NEG_VALUE, POS_VALUE}
      
       * `fixup` is an array of 8 nybbles (a 32-bit int) and is looked up based on the categorization of `src` {QNAN = 0 ... POS_VALUE = 7}
      
       * The values of each nybble denote which value to place into `dst`:
      
          0x0 : dst (unchanged)
          0x1 : src (with denormals as zero if MXCSR.DAZ is set)
          0x2 : QNaN(src)
          0x3 : QNAN_Indefinite
          0x4 : -INF
          0x5 : +INF
          0x6 : src < 0 ? -INF : +INF
          0x7 : -0
          0x8 : +0
          0x9 : -1
          0xA : +1
          0xB : 1/2
          0xC : 90.0
          0xD : PI/2
          0xE : MAX_FLOAT
          0xF : -MAX_FLOAT
      
      You want to set the nybble for categorization ZERO (bits 11..8) to 0x7 (-0) in `fixup`. This would mean you want `fixup` to be equal to `0x00000700`. So usage would be:

          static __m128i zerofixup = { 0x700 };
      
          double fixnum_to_flonum(int64_t fixnum) {
              __m128d flonum = { (double)fixnum };
              return _mm_cvtsd_f64(_mm_fixupimm_sd(flonum, flonum, zerofixup, 0));
          }
      
      Which compiles to just 4 instructions, with no branches:

          .FIXUP:
              .long   1792                            # 0x700
              .long   0                               # 0x0
              .long   0                               # 0x0
              .long   0                               # 0x0
          fixnum_to_flonum:
              vcvtsi2sd       xmm0, xmm0, rdi
              vmovq           xmm0, xmm0
              vfixupimmsd     xmm0, xmm0, qword ptr [rip + .FIXUP], 0
              ret
      
      It can be extended to operate on 8 int64->double at a time (__m512d) with little extra cost.

      You could maybe use this optimization where the instruction is available and just stick with a branch version otherwise, or figure out some other way to make it branchless - though I can't think of any other way which would be any faster than a branch.

      [1]:https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

gethly 2 days ago

i would guess that because of how *** * floats are in binary computers, you have something like -0.0000000000000000000000000000000000001 and when you round it you end up with -0.0. Same goes for positive value, you're just not used to write the + sign before every number, so seeing the minus feels strange.

  • dcminter 2 days ago

    You're answering a question that OP did not ask.

kazinator 3 days ago

What happens if we take the smallest (as in closest to zero) negative subnormal and add it to itself?