mmozeiko 4 hours ago

There is a simple way to get that immediate from expression you want to calculate. For example, if you want to calculate following expression:

    (NOT A) OR ((NOT B) XOR (C AND A))
then you simply write

    ~_MM_TERNLOG_A | (~_MM_TERNLOG_B ^ (_MM_TERNLOG_C & _MM_TERNLOG_A))
Literally the expression you want to calculate. It evaluates to immediate from _MM_TERNLOG_A/B/C constants defined in intrinsic headers, at least for gcc & clang:

    typedef enum {
      _MM_TERNLOG_A = 0xF0,
      _MM_TERNLOG_B = 0xCC,
      _MM_TERNLOG_C = 0xAA
    } _MM_TERNLOG_ENUM;
For MSVC you define them yourself.
  • o11c an hour ago

    To take the magic away, write it in binary:

      A = 0b11110000
      B = 0b11001100
      C = 0b10101010
Sniffnoy 6 hours ago

Oh, I thought the title was saying that the instruction doesn't work properly! (The article actually just explains how it works.)

  • SomeHacker44 6 hours ago

    Agreed on initial interpretation. Terrible title!

    • foota 5 hours ago

      Amusingly, I had a third interpretation, which is "busted" as being too strong. I realized that when the author started talking about the Amiga though that's probably not what they meant (as busted is a fairly modern gaming term, I'd be surprised to see someone as old as to be familiar with Amiga to use it. Sorry to anyone that feels personally attacked by this description :P)

      • thfuran 4 hours ago

        Is broken not cool enough anymore?

        • aspenmayer 2 hours ago

          I’ve heard broken and cracked used in this way, usually in a game-mechanics balance context implying something is overpowered, but busted in that same context has a negative connotation to me for some reason, signifying something being underpowered or bugged, if that makes sense. I can’t even tell if this is ironic or unironic at this point.

cubefox 4 hours ago

About the title: "Ternary logic" usually means "logic with three truth values". But this piece covers a compiler instruction which handles all binary logic gates with three inputs.

  • dzaima 4 hours ago

    The x86 instruction is named 'ternlog', and intrinsic - 'ternarylogic' though; while perhaps unfortunate, the title is appropriate. (and even then 'bitwise' already sort of takes place of what 'ternary'-as-three-valued would, and 'ternary' is also very often three-input, so much so that 'a ? b : c' is often called the ternary operator (and in fact ternlog can simulate this ternary operation; and in fact the article is even about exactly that))

    • cubefox 2 hours ago

      Yeah, though the article describes 0xE2, which is 'b ? a : c'. 'a ? b : c' would be 0xCA.

  • someguydave 2 hours ago

    Agree, I was also confused on this point. I guess the name “evaluate a three term binary expression” is less snappy though.

Lerc 6 hours ago

My teenage self did not write "CRAP!" on that page of the hardware manual, but I stared at it for so long trying to figure it out.

In the end I did what pretty much everyone else did, Found the BLTCON0 for Bobs and straight copies and then pretended I newer saw the thing.

I did however get an A+ in computational logic at university years later, so maybe some of the trauma turned out to be beneficial.

kens 6 hours ago

I'll point out that this is the same way that FPGAs implement arbitrary logic functions, as lookup tables (LUTs).

leogao 4 hours ago

Nvidia SASS has a similar instruction too (LOP3.LUT)

Findecanor 5 hours ago

I didn't have the official Amiga hardware manual, but instead the book "Mapping the Amiga". It said the same thing in a slight more verbose way. I don't remember which minterms I used back then but I think I managed to work things out from this book to do shadebobs, bobs, XOR 3D line drawing and other things.

The page in Mapping the Amiga: https://archive.org/details/1993-thomson-randy-rhett-anderso...

ChuckMcM an hour ago

This is an instruction I would like to implement in RISC-V if it isn't already, (which yeah, I know, isn't very RISC like)

   movei (%r1),(%r2),(%r3),value
Move the contents of memory pointed to by r1, to the contents of memory pointed to by r2, applying the boolean operator <value>, with the memory pointed to by r3. Then increment all three registers by 4 to point to the next word. There was something similar to this in the Intel 82786 graphics chip which had a sort of minimal cpu part that could run simple "programs".

And yeah, I really enjoyed the blitter on the Amiga. It was a really cool bit of hardware.

londons_explore 4 hours ago

Do compilers actually output this instruction?

So many super-clever instructions are next to impossible for compilers to automatically use.

notfed 4 hours ago

Couldn't every Boolean operation be "busted" as a lookup table?

  • ekimekim 4 hours ago

    Yes! But your lookup table will need 2^N bits for a function with N inputs. In this way you can easily enumerate all possible functions from N bits to 1 bit.

    As a fun exercise, you can do this for all 2-bit -> 1-bit functions. There's only 16 of them, and most of them have very well known names like "and" (LUT 1000) or "xor" (LUT 0110). Some of them don't depend on some of the inputs (eg. LUT 1100 / 1010 which is "return A" and "return B" respectively) or even any of them (eg. LUT 0000 which always returns 0).

transfire 4 hours ago

Great little article! Thank you.

hvenev 2 hours ago

> an obscure instruction

Come on, vpternlog* is not obscure. It subsumes _all_ bitwise instructions, even loading the constant (-1) into a register.

  • LegionMammal978 38 minutes ago

    Not if, like most people, you still aren't using a CPU with AVX-512 support. And I don't recall ever seeing it in compiler output in any case. It's not like boolean operations on three variables occur very frequently in most programs, especially (EDIT: this apparently isn't the case) not operations that can't be decomposed into a pair of two-variable operations with no worse performance.

    • dzaima 26 minutes ago

      As far as everything on uops.info goes, ternlog has the same throughput and latency as the two-operand logic instructions everywhere (with the mild exception of Zen 4 where it goes from 0.50 to 0.56 cycles/instr; which also shows as having 2-cycle latency to one operand but I think that might be measurement error), so it's always bad to decompose ternlog into two-operand logic ops.

486sx33 6 hours ago

it’s fundamentally just a lookup table