Literally the expression you want to calculate. It evaluates to immediate from _MM_TERNLOG_A/B/C constants defined in intrinsic headers, at least for gcc & clang:
Amusingly, I had a third interpretation, which is "busted" as being too strong. I realized that when the author started talking about the Amiga though that's probably not what they meant (as busted is a fairly modern gaming term, I'd be surprised to see someone as old as to be familiar with Amiga to use it. Sorry to anyone that feels personally attacked by this description :P)
I’ve heard broken and cracked used in this way, usually in a game-mechanics balance context implying something is overpowered, but busted in that same context has a negative connotation to me for some reason, signifying something being underpowered or bugged, if that makes sense. I can’t even tell if this is ironic or unironic at this point.
About the title: "Ternary logic" usually means "logic with three truth values". But this piece covers a compiler instruction which handles all binary logic gates with three inputs.
The x86 instruction is named 'ternlog', and intrinsic - 'ternarylogic' though; while perhaps unfortunate, the title is appropriate. (and even then 'bitwise' already sort of takes place of what 'ternary'-as-three-valued would, and 'ternary' is also very often three-input, so much so that 'a ? b : c' is often called the ternary operator (and in fact ternlog can simulate this ternary operation; and in fact the article is even about exactly that))
Basically CPUs, GPUs and FPGAs all converge to the crab equivalent of computation. They all expose the same capability with different areas of optimization.
I didn't have the official Amiga hardware manual, but instead the book "Mapping the Amiga". It said the same thing in a slight more verbose way. I don't remember which minterms I used back then but I think I managed to work things out from this book to do shadebobs, bobs, XOR 3D line drawing and other things.
This is an instruction I would like to implement in RISC-V if it isn't already, (which yeah, I know, isn't very RISC like)
movei (%r1),(%r2),(%r3),value
Move the contents of memory pointed to by r1, to the contents of memory pointed to by r2, applying the boolean operator <value>, with the memory pointed to by r3. Then increment all three registers by 4 to point to the next word. There was something similar to this in the Intel 82786 graphics chip which had a sort of minimal cpu part that could run simple "programs".
And yeah, I really enjoyed the blitter on the Amiga. It was a really cool bit of hardware.
Yes! But your lookup table will need 2^N bits for a function with N inputs. In this way you can easily enumerate all possible functions from N bits to 1 bit.
As a fun exercise, you can do this for all 2-bit -> 1-bit functions. There's only 16 of them, and most of them have very well known names like "and" (LUT 1000) or "xor" (LUT 0110). Some of them don't depend on some of the inputs (eg. LUT 1100 / 1010 which is "return A" and "return B" respectively) or even any of them (eg. LUT 0000 which always returns 0).
Not if, like most people, you still aren't using a CPU with AVX-512 support. And I don't recall ever seeing it in compiler output in any case. It's not like boolean operations on three variables occur very frequently in most programs, especially (EDIT: this apparently isn't the case) not operations that can't be decomposed into a pair of two-variable operations with no worse performance.
As far as everything on uops.info goes, ternlog has the same throughput and latency as the two-operand logic instructions everywhere (with the mild exception of Zen 4 where it goes from 0.50 to 0.56 cycles/instr; which also shows as having 2-cycle latency to one operand but I think that might be measurement error), so it's always bad to decompose ternlog into two-operand logic ops.
There is a simple way to get that immediate from expression you want to calculate. For example, if you want to calculate following expression:
then you simply write Literally the expression you want to calculate. It evaluates to immediate from _MM_TERNLOG_A/B/C constants defined in intrinsic headers, at least for gcc & clang: For MSVC you define them yourself.To take the magic away, write it in binary:
Oh, I thought the title was saying that the instruction doesn't work properly! (The article actually just explains how it works.)
Agreed on initial interpretation. Terrible title!
Amusingly, I had a third interpretation, which is "busted" as being too strong. I realized that when the author started talking about the Amiga though that's probably not what they meant (as busted is a fairly modern gaming term, I'd be surprised to see someone as old as to be familiar with Amiga to use it. Sorry to anyone that feels personally attacked by this description :P)
Is broken not cool enough anymore?
I’ve heard broken and cracked used in this way, usually in a game-mechanics balance context implying something is overpowered, but busted in that same context has a negative connotation to me for some reason, signifying something being underpowered or bugged, if that makes sense. I can’t even tell if this is ironic or unironic at this point.
About the title: "Ternary logic" usually means "logic with three truth values". But this piece covers a compiler instruction which handles all binary logic gates with three inputs.
The x86 instruction is named 'ternlog', and intrinsic - 'ternarylogic' though; while perhaps unfortunate, the title is appropriate. (and even then 'bitwise' already sort of takes place of what 'ternary'-as-three-valued would, and 'ternary' is also very often three-input, so much so that 'a ? b : c' is often called the ternary operator (and in fact ternlog can simulate this ternary operation; and in fact the article is even about exactly that))
Yeah, though the article describes 0xE2, which is 'b ? a : c'. 'a ? b : c' would be 0xCA.
Agree, I was also confused on this point. I guess the name “evaluate a three term binary expression” is less snappy though.
My teenage self did not write "CRAP!" on that page of the hardware manual, but I stared at it for so long trying to figure it out.
In the end I did what pretty much everyone else did, Found the BLTCON0 for Bobs and straight copies and then pretended I newer saw the thing.
I did however get an A+ in computational logic at university years later, so maybe some of the trauma turned out to be beneficial.
I'll point out that this is the same way that FPGAs implement arbitrary logic functions, as lookup tables (LUTs).
Basically CPUs, GPUs and FPGAs all converge to the crab equivalent of computation. They all expose the same capability with different areas of optimization.
Most but not all. Actel/Microsemi use a small tree of muxes and gates.
So does the 74181 ALU.
I don't think the 74181 is implemented with a LUT.
http://www.righto.com/2017/01/die-photos-and-reverse-enginee...
Head over to https://www.sandpile.org, and find VPTERNLOG on the 3-byte opcode page https://www.sandpile.org/x86/opc_3.htm and you will not only see Intel's apparent past plan for the variants with byte and word masking (AVX512BITALG2), but also the links from the Ib operand to the ternary logic table page https://www.sandpile.org/x86/ternlog.htm with all 256 cases.
Nvidia SASS has a similar instruction too (LOP3.LUT)
Another example of packing bitwise ops into an integer is win32's GDI ROP codes: https://learn.microsoft.com/en-us/windows/win32/gdi/ternary-...
I didn't have the official Amiga hardware manual, but instead the book "Mapping the Amiga". It said the same thing in a slight more verbose way. I don't remember which minterms I used back then but I think I managed to work things out from this book to do shadebobs, bobs, XOR 3D line drawing and other things.
The page in Mapping the Amiga: https://archive.org/details/1993-thomson-randy-rhett-anderso...
This is an instruction I would like to implement in RISC-V if it isn't already, (which yeah, I know, isn't very RISC like)
Move the contents of memory pointed to by r1, to the contents of memory pointed to by r2, applying the boolean operator <value>, with the memory pointed to by r3. Then increment all three registers by 4 to point to the next word. There was something similar to this in the Intel 82786 graphics chip which had a sort of minimal cpu part that could run simple "programs".And yeah, I really enjoyed the blitter on the Amiga. It was a really cool bit of hardware.
Do compilers actually output this instruction?
So many super-clever instructions are next to impossible for compilers to automatically use.
Couldn't every Boolean operation be "busted" as a lookup table?
Yes! But your lookup table will need 2^N bits for a function with N inputs. In this way you can easily enumerate all possible functions from N bits to 1 bit.
As a fun exercise, you can do this for all 2-bit -> 1-bit functions. There's only 16 of them, and most of them have very well known names like "and" (LUT 1000) or "xor" (LUT 0110). Some of them don't depend on some of the inputs (eg. LUT 1100 / 1010 which is "return A" and "return B" respectively) or even any of them (eg. LUT 0000 which always returns 0).
Great little article! Thank you.
> an obscure instruction
Come on, vpternlog* is not obscure. It subsumes _all_ bitwise instructions, even loading the constant (-1) into a register.
Not if, like most people, you still aren't using a CPU with AVX-512 support. And I don't recall ever seeing it in compiler output in any case. It's not like boolean operations on three variables occur very frequently in most programs, especially (EDIT: this apparently isn't the case) not operations that can't be decomposed into a pair of two-variable operations with no worse performance.
As far as everything on uops.info goes, ternlog has the same throughput and latency as the two-operand logic instructions everywhere (with the mild exception of Zen 4 where it goes from 0.50 to 0.56 cycles/instr; which also shows as having 2-cycle latency to one operand but I think that might be measurement error), so it's always bad to decompose ternlog into two-operand logic ops.
it’s fundamentally just a lookup table