The Trinary Dream Endures

47 points by FromTheArchives 2 days ago

mikewarot 2 days ago

Transistors are generally at their lowest static power dissipation if the are either fully on or off. The analog middle is great if you're trying to process continuous values, but then you're going to be forced to use a bias current to hold on in the middle, which is ok if that's the nature of the circuit.

A chip with billions of transistors can't reasonably work if most of them are in the analog mode, it'll just melt to slag, unless you have an amazing cooling system.

Also consider that there is only one threshold between values on a binary system. With a trinary system you would likely have to double the power supply voltage, and thus quadruple the power required just to maintain noise margins.

throw10920 2 days ago

This is great point, and I'll extend it by claiming that there's a more general physical principle underneath: that it's significantly easier to build bistable systems than tristable (or higher) systems, so much so that it makes up for the fact that you need more of them.
This is far more general than electronic systems (e.g. quantum computers follow the same principle - it's far easier to build and control qubits than qutrits/qudits).
(technically, it's even easier to build systems that have a single stable configuration, but you can't really store information in those, so they're not relevant)
rini17 2 days ago

It can be solved various ways, not only middle, electricity has negative voltages too. So you can have the third distinct "fully on" state at negative voltage. This isn't practical with silicon semiconductors but might be possible with other technology. The Soviet ternary computer Setun used custom ternary switches.
- theamk 2 days ago
  
  there is nothing special about negative voltages, it's all relative to some point anyway.
  With mixed analog/digital circuits for example, it's pretty common to treat exactly same voltages either as -2.5/0/2.5 (relative to midpoint), or as 0/2.5/5 (relative to negative rail).
  What matters is having multiple treshold voltages with distinct behaviour. Setun used ferrite transformers which do have multiple thresholds (postive and negative fields) - but modern electronics, including transistors, does not.
  - CoastalCoder a day ago
    
    This is 1000% not my area of expertise, but if we're imagining circuits where current can flow in different directions at different times, would diodes potentially become more interesting for logic?
    
    rini17 a day ago
    
    Diodes have quite big forward voltage, like 0.5V, depends on current but in any case it means lots of wasted energy. FETs got highly optimized over time, can be switched on with only minuscule resistance and also can be miniaturized.
    Modern FETs are capable of switching spot welding currents without getting destroyed, while in thumbnail-sized package, imagine that. My grandpa was an electrical engineer and would be completely blown away by such a component.
    
    theamk a day ago
    
    DTL, diode-transisor logic, was used in 1960-1970's when they did not have cheap high-quality transistors yet. The power consumption was pretty terrible, but you needed less semiconductors.
    In the modern logic, diodes are not that useful because transistors already react to one polarity only. You simply connect multiple transistors to same input, and the right ones will activate.
- pezezin 2 days ago
  
  It is perfectly viable with silicon. The venerable Fast Ethernet used PAM3, as do USB4 and GDDR7, and Gigabit Ethernet uses PAM5.
  - mikewarot 2 days ago
    
    Those are analog systems, and thus you have to handle them with transistors operating in a linear mode, which is why there are dedicated circuits to handle the interface and translate it back into something binary as soon as possible, so that conventional logic can use the data.
    Basically, every ethernet card is now a modem.
foxglacier 2 days ago

Wouldn't you also get data loss using the linear region of transistors? The output would be have some error from the input and it would propagate through the circuit, perhaps eventually reaching on or off where it would be stuck.

bastawhiz 2 days ago

Trinary is an efficient way of storing lots of -1/0/1 machine learning model weights. But as soon as you load it into memory, you need RAM that can store the same thing (or you're effectively losing the benefits: storage is cheap). So now you need trinary RAM, which as it turns out, isn't great for doing normal general purpose computation with. Integers and floats and boolean values don't get stored efficiently in trinary unless you toss out power of two sized values. CPU circuitry becomes more complicated to add/subtract/multiply those values. Bitwise operators in trinary become essentially impossible for the average IQ engineer to reason about. We need all new IAs, assembly languages, compilers, languages that can run efficiently without the operations that trinary machines can't perform well, etc.

So do we have special memory and CPU instructions for trinary data that lives in a special trinary address space, separate from traditional data that lives in binary address space? No, the juice isn't worth the squeeze. There's no compelling evidence this would make anything better overall: faster, smaller, more energy efficient. Every improvement that trinary potentially offers results in having to throw babies out with the bathwater. It's fun to think about I guess, but I'd bet real money that in 50 years we're still having the same conversation about trinary.

bee_rider 2 days ago

> Trinary didn’t make any headway in the 20th century; binary’s direct mapping to the “on”/”off” states of electric current was just too effective, or seductive; but remember that electric current isn’t actually “on” or “off”. It has taken a ton of engineering to “simulate” those abstract states in real, physical circuits, especially as they have gotten smaller and smaller.

But, I think things are actually trending the other way, right? You just slam the voltage to “on” or “off” nowadays—as things get smaller, voltages get lower, and clock times get faster, it gets harder to resolve the tiny voltage differences.

Maybe you can slam to -1. OTOH, just using 2 bits instead of one... trit(?) seems easier.

Same reason the “close window” button is in the corner. Hitting a particular spot requires precision in 1 or 2 dimensions. Smacking into the boundary is easy.

hinkley 2 days ago

The lower voltage helps reduce leakage and capacitance in the chip as the wires get closer together.
But it does argue against more states due to the benefits of just making 1 smaller if you can and packing things closer. Though maybe we are hitting the bottom with Dennard scaling being dead. Maybe we increase pitch and double state on parts of the chip, and then generations are measured by bits per angstrom.
estimator7292 2 days ago

Once we invented CMOS this problem pretty much went away. You can indeed just slam the transistor open and closed.
Well, until we scaled transistors down to the point where electrons quantum tunnel across the junction. Now they're leaky again.
- amirhirsch 2 days ago
  
  Not quite. Leakage current in CMOS circuits became the dominant source of power consumption around the 90 nm and 65 nm nodes, long before quantum tunneling was a major factor, and often exceeded dynamic switching power. This led to the introduction of multiple threshold-voltage devices and body-biasing techniques to dynamically adjust Vt and curb static leakage.

russdill 2 days ago

There's a ton of places in modern silicon where a voltage represents far more than just on or off. From the 16 levels of QLC to the various PAM technologies used by modern interconnects

theamk a day ago

And the first thing you do to them is to convert them to binary.
Look at any Ethernet PHY for example - you have that nice 5-level signal coming on.. and the first thing you do is feed it into AFE/ADC so you get digital signals that you can actually work with.
So yes, in some specific situations, like Flash memory or interconnects, there are multi-level signals. But the computing itself is always binary.
hinkley 2 days ago

I’ve wondered any number of times if 4 level gates would be useful to increase cache memory in CPUs. They aren’t great for logic, but how much decoding would they need to expand an L3 cache?
DiggyJohnson 2 days ago

What is PAM in this context?
- saxonww 2 days ago
  
  Pulse amplitude modulation
  - DiggyJohnson 2 days ago
    
    Thanks. That’s a deep rabbit hole upon initial glances to say the least

zellyn 2 days ago

I once tried to start enumerating gate types for Trinary.

In binary, with two inputs, there are 2^2 = 4 total possible inputs (00, 01, 10, 11). Different gate types can give different outputs for each of those four inputs: each output can be 0 or 1, so that's 2^4 == 16 different possible gate types. (0, 1, A, B, not A, not B, AND, OR, NAND, NOR, XOR, XNOR, A and not B, B and not A, A or not B, B or not A)

In ternary, with two inputs, there are 3^2 = 9 total possible inputs, so 3^9 = 19,683. I'm sure there are some really sensible ones in there, but damn that's a huge search space. That's where I gave up that time around! :-)

gyomu 2 days ago

> Trinary is philosophically appealing because its ground-floor vocabulary isn’t “yes” and “no”, but rather: “yes”, “no”, and “maybe”. It’s probably a bit much to imagine that this architectural difference could cascade up through the layers of abstraction and tend to produce software with subtler, richer values … yet I do imagine it.

You can just have a struct { case yes; case no; case maybe; } data structure and pepper it throughout your code wherever you think it’d lead to subtler, richer software… sure, it’s not “at the hardware level” (whatever that means given today’s hardware abstractions) but that should let you demonstrate whatever proof of utility you want to demonstrate.

recursivecaveat a day ago

Most usages of a 'bool' are taking up at least a byte, so everyone has 250-odd extra values to play with "at the hardware level" already if they really want.

alphazard 2 days ago

I've never understood the fascination here. Apparently some expression relating the number of possible symbols and the length of a message is closer to euler's number. I don't see why the product of those things is worth optimizing for. The alphabet size that works best is dictated by the storage technology, more symbols usually means it's harder to disambiguate.

2 is the smallest amount of symbols needed to encode information, and makes it the easiest to disambiguate symbols in any implementation, good enough for me.

kingstnap 2 days ago

The idea is roughly that the effort needed to use a system is in some situations ~propto the number of symbols * the number of needed positions.
Here's a concrete example, imagine you needed to create some movable type because you are creating a printing press. And you need to represent all numbers upto 100 million.
In binary you need to make 53 pieces, in ternary 50, in octal 69 pieces, in decimal 81 and in hexadecimal 101.
- alphazard 2 days ago
  
  > In binary you need to make 53 pieces, in ternary 50, in octal 69 pieces, in decimal 81 and in hexadecimal 101.
  These numbers don't quite make sense to me. Hexadecimal should have 16 symbols, and then `log16(whatever) = message length`. I get what you're trying to say though.
  That trend continues up until the symbols start looking the same and no one can read them, and now the most important problem is not a position on a tradeoff curve. It's that the system is no longer reliable.
  If you wanted each letter to have the highest probability of successfully being read, you would use a grid, and shade or leave blank each grid square.
  - kingstnap 2 days ago
    
    The hex calculation is a bit like this
    100 million in Hex is 5F5E100
    You need 6*16 for the trailing + 5 pieces for the leading digit. If you wanted to do any number from 0 to 100 million.

pontifier 2 days ago

I remember reading somewhere that because Ternary computing is inherently reversible, that from an information theoretic point of view that ternary computations have a lower theoretical bound on energy usage, and as such could be a way to bypass heat dissipation problems in chips built with ultra-high density, large size, and high computational load.

I wasn't knowledgeable enough to evaluate that claim at the time, and I'm still not.

pontifier 2 days ago

Here's a couple of sources that back up what I was talking about:
https://ieeexplore.ieee.org/document/9200021
https://en.wikipedia.org/wiki/Landauer%27s_principle

patcon 2 days ago

Maybe of interest, re: neuromorphic computing that's perhaps more aligned with biological efficiency.

https://github.com/yfguo91/Ternary-Spike

Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

> The Spiking Neural Network (SNN), as one of the biologically inspired neural network infrastructures, has drawn increas- ing attention recently. It adopts binary spike activations to transmit information, thus the multiplications of activations and weights can be substituted by additions, which brings high energy efficiency. However, in the paper, we theoret- ically and experimentally prove that the binary spike acti- vation map cannot carry enough information, thus causing information loss and resulting in accuracy decreasing. To handle the problem, we propose a ternary spike neuron to transmit information. The ternary spike neuron can also enjoy the event-driven and multiplication-free operation advantages of the binary spike neuron but will boost the information ca- pacity. Furthermore, we also embed a trainable factor in the ternary spike neuron to learn the suitable spike amplitude, thus our SNN will adopt different spike amplitudes along layers, which can better suit the phenomenon that the membrane po- tential distributions are different along layers. To retain the efficiency of the vanilla ternary spike, the trainable ternary spike SNN will be converted to a standard one again via a re- parameterization technique in the inference. Extensive experi- ments with several popular network structures over static and dynamic datasets show that the ternary spike can consistently outperform state-of-the-art methods.

pumplekin 2 days ago

I've always thought we could put a bit of general purpose TCAM into general purpose computers instead of just routers and switches, and see what people can do with it.

I know (T)CAM's are used in CPU's, but I am nore thinking of the kind of research being done with TCAM's in SSD like products, so maybe we will get there some day.

hinkley 2 days ago

There’s a lot of tech in signaling that doesn’t end up on CPUs and I’ve often wondered why.
Some of it is ending up in power circuitry.
cyberax 2 days ago

TCAM still uses 2-bit binary storage internally, it just ignores one of the values.

woadwarrior01 2 days ago

Ternary quantized weights for LLMs are a thing. Most of the weights in Bitnet b1.58[1] class models[2][3] are ternary (-1/0/1).

[1]: https://arxiv.org/abs/2402.17764

[2]: https://huggingface.co/tiiuae/Falcon-E-3B-Instruct

[3]: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

jacobmarble 2 days ago

In digital circuits there’s “high”, “low”, and “high impedance”.

gblargg 2 days ago

There's low-impedance and high-impedance. Within low-impedance, there's high and low.

anon291 2 days ago

Mapping the three trinary values to yes no and maybe is semantic rubbish

SteveJS 2 days ago

https://en.wikipedia.org/wiki/Ternary_computer

notRobot 2 days ago

Genuine question: if we can go beyond two, why not go beyond three? What makes three appealing but not a larger number?

LargoLasskhyfv 17 hours ago

So called https://en.wikipedia.org/wiki/Optimal_radix_choice

ChrisMarshallNY 2 days ago

I seem to remember reading about "fuzzy logic" (a now-quaint term), where a trinary state was useful.

zer00eyz 2 days ago

"One feature that sets certain rice cookers above the rest is “fuzzy logic,” or the ability of an onboard computer to detect how quickly the rice is cooking or to what level doneness it has reached, then make real time adjustments to time and temperature accordingly. " ... From: https://www.bonappetit.com/story/zojirushi-rice-cooker
It is a term that is still quite a fair bit for marketing. I think in this case (zojirushi) it isn't trinary, rather some probalistic/baysian system to derive a boolean from a number of factors (time, temp, and so on).
- jasonwatkinspdx 2 days ago
  
  Back in the late 80s and early 90s fuzzy logic became something of a fad in Japan because several of the leading researchers were at Japanese institutions. So it became a term of hype with a bit of flag waving involved.
  I'm reasonably convinced my Zojirushi has nothing more than a way to sense when the evaporation shifts and to start the "steaming" countdown timer then, probably using the resistance of the heating coil. In other words it's just a replacement for the weight/balance mechanism in a traditional "dumb" rice cooker, not something doing more complex modeling as far as I can tell.
  It is however built like a tank and "just works" so I'm entirely happy with my purchase.
  - zer00eyz 2 days ago
    
    > weight/balance mechanism in a traditional "dumb" rice cooker
    These are far more interesting than that. Technology Connections YouTube channel did a great breakdown of how they really work: https://www.youtube.com/watch?v=RSTNhvDGbYI

DiggyJohnson 2 days ago

This is off topic but how do you build and post to that blog? Homegrown or framework?

hyperhello 2 days ago

Well, maybe.

1970-01-01 2 days ago

Isn't quantum computing "all the aries"

The quantum dream is also the trinary dream.

Nevermark 2 days ago

Ternary is indeed an enticing, yet ultimately flawed dream.

Quaternary allows for:

  True, “Yes”

  False, “No”

  Undetermined, “Maybe”, “Either”, True or False

And:

  Contradiction, “Invalid”, “Both”, True and False

For logical arithmetic, I.e. reducing tree expressions, True and False are enough.

But in algebraic logic, where more general constraint topologies are possible, the other two values are required.

What is the logical value of the isolated expression “(x)”? I.e. “x” unconstrained?

Or the value of the expression “(x = not x)”?

None of 4-valued logic’s values are optional or spurious for logical algebra.

—-

Many people don’t know this, but all modern computers are quaternary, with 4 quaternit bytes. We don’t just let anyone in on that. Too much power, too much footgun jeopardy, for the unwashed masses and Python “programmers”.

The tricky thicket of web standards can’t be upgraded without introducing mayhem. But Apple’s internal-only docs reveal macOS and Swift have been fully quaternary compliant on their ARM since the M1.

On other systems you can replicate this functionality, at your own risk and effort, by accessing each quaternit with their two bit legacy isomorphic abstraction. Until Rust ships safe direct support.

—-

It will revolutionize computing, from the foundations up, when widely supported.

Russell’s paradox in math is resolved. Given a set S = “The set of all sets that don’t contain themselves”, the truth value of “Is S in S” in quaternary logic, reduces to Contradiction, which indeed it is. I.e. True and False. Making S a well formed, consistent entity, and achieving full set and logical completeness with total closure. So consistency is returned to Set theory and Russell’s quest for a unification of mathematics with just sets and logic becomes possible again. He would have been ecstatic. Gödel be damned! [0]

Turing’s Incompleteness Theorem demonstrates that 2-valued bit machines are inherently inconsistent or incomplete.

Given a machine M, applied to the statement S = “M will say this statement is False”, or “M(S) = False”, it has to fail.

If M(S) returns True, we can see that S is actually False. If M(S) returns False, we can see that actually S is True.

But for a quaternary Machine M4 evaluating S4 = “M4(S4) = False”, M4(S4) returns Contradiction. True and False. Which indeed we can see S4 is. If it is either True or False, we know it is the other as well.

Due to the equivalence of Undecidability and the Turing Halting Problem, resolving one resolves the other. And so quaternary machines are profoundly more powerful and well characterized than binary machines. Far better suited for the hardest and deepest problems in computing.

It’s easy to see why the developers of Rust and Haskell are so adamant about getting this right.

[0] https://tinyurl.com/godelbedamned

IndrekR 2 days ago

Most common quaternary storage system is probably DNA.
nzeid 2 days ago

Not wrong, but I think the hope was more to have "8-trinit" bytes i.e. something with more states than a classic bit.
- Nevermark 2 days ago
  
  Thank you for taking my points with exactly the seriousness they deserve.
  I respond in that spirit.
  Taking the convention that “byte” always means 8 n-valued bits:
  One advantage of a convention of 8 quaternit bytes is they can be readily used as 8 ternary valued bytes too, albeit with reduced use of their value range.
  8 quaternit bytes also have the advantage of higher resolution addressing, I.e. at the nibble = 4 quaternary bit boundaries. (The last bit of modern memory addresses indicates the upper or lower quaternary nibble.
  - nzeid 18 hours ago
    
    I have the distinct feeling something went over my head here, hehe. Regardless, thanks for the insight.
Nevermark 2 days ago

Edit/addition: there are some serious points in the Socratic satire there, for those who want to consider.
Despite our natural aesthetic hesitancy to equate a 4-valued bit with two 2-valued bits, we all understand they are the same. Many “binary” storage devices do the reverse, and store multiple “binary” values with higher range cells.
A bit of information (whatever it’s arity) is the same bit regardless of how it is stored or named.
We get stuck in our familiar frames and names.
Also, the points about Russell’s paradox and Turing Incompleteness are conveyed in an absurdist’s deadpan, but they are in fact actual critiques I am making. In both cases, two-valued logic, suitable only for arithmetic, is used in algebraic contexts where self-referencing and open constraints are both possible, despite the basic inability of two-valued logic to represent the values of either.
It is startling to me, the obvious limitations this out-of-the-gate bad assumption of an excluded middle in algebraic contexts, places on the generality of conclusions in both treatments, where the failings of the excluded middle are basically the "proof". Proof of what was assumed, essentially.
Anyone who cares about those topics can work through those points. Neither are as meaningless or trivial as might be expected.
Finally, four valued logic would be very useful to support at CPU instruction levels, for algebraic contexts, beyond arithmetic. Especially since no changes to memory are needed.
Interestingly, with 4-valued logic, there are two different sets of AND, OR and NOT, for two ways they can be treated. And the current bit-wise operators, acting on [tf] 2-bit 4-valued logic (True as [+-], False as [-+], [--] as unknown, and [++] as contradiction) already implement the new versions of those operations. So new instructions are only needed to implement regular AND, OR, NOT operations for 2-bit 4-valued logical values.
readthenotes1 2 days ago

I've liked true, false, unknown, unknowable--though think there should be a something somewhere for fnord.

unit149 2 days ago

[dead]

marshfram 2 days ago

Analog is next. Software first, then build the machines. No more models, reductions, loss. Direct perception through measurement and differences.

cluckindan 2 days ago

Analog was before, though. General computing was never realized using those architectures; granted, they were mechanical in nature, so that is a big ask, both figuratively and literally.
Maybe we could create continuous-valued electrical computers, but at least state, stability and error detection are going to be giant hurdles. Also, programming GUIs from Gaussian splats sounds like fun in the negative sense.
- estimator7292 2 days ago
  
  You've just described vacuum tube computers as well as all the early transistorized computers. Digital computing is a relatively late concept
  - cluckindan a day ago
    
    The important difference is that all those early analog computers were either bespoke or suited for a very narrow subset of tasks, like fire control computers. They were very far from general purpose computers, and that is the reason the von Neumann architecture dominates today: we are free to change the operation of the computer without literally changing gears or re-wiring all logic paths. Before, the hardware was the software.
  - mcnamaratw 2 days ago
    
    Of course there were analog 'computers' but vacuum tubes were also used to realize digital computers in the early days.
    https://en.wikipedia.org/wiki/Vacuum-tube_computer
- marshfram 2 days ago
  
  You have to withdraw from the binary in all senses to begin to imagine what an analog spatial differences measurement could function as.
  Again, think software first. The brain is always a byproduct of the processes, though it is discerned as a materialist operation.
  Think big, binary computers are toys in the gran scheme of things.
bastawhiz 2 days ago

We'd need a real breakthrough in physics to have such a technology that works at a scale even remotely comparable to what a low end digital CPU can do today. The thing is, there's not even any real evidence (at least to my knowledge) that there are useful threads that researchers know to pull on that could yield such a technology. Emulating analog hardware with digital hardware in anticipation of some kind of breakthrough isn't going to have any material benefits in the short to medium term.
- marshfram 2 days ago
  
  You're thinking small, think about software first. Analoga of differentials. Spatial. Specifics. If you get trapped in emulation, you're trapped in counting.
  - bastawhiz a day ago
    
    What does that even mean
    
    marshfram a day ago
    
    Binary computers are toy prisons of counting that model in a reality that has no correct models, not even neuromorphic will gain validity here, leave them behind. Start over with the forms we know are valid in neural syntax and apply them to software.
    
    bastawhiz a day ago
    
    That's a great sentiment but without code that does what you're describing, that means nothing. Put some code out into the world that demonstrates the idea, because otherwise it's just inspirational words without substance.
    
    marshfram a day ago
    
    It's not a sentiment, it's real coding. Rhythms of the Brain by Buzsaki. Cinema is already a prototype of differences, using space, and coreelating topological states. It's analog software that's been running for 130 years.
    Binary was always a dead end alley, we knew this going in. How do we escape it?
    
    bastawhiz a day ago
    
    It's not real coding if you can't point to code and show people how to do it. Where's the GitHub link? Where are the tutorials? Where are the YouTube videos explaining why it's important and how to do it right? You're just saying words, show me something tangible that's not just a vibe.
    
    marshfram a day ago
    
    Closed to public view.
    
    bastawhiz 12 hours ago
    
    Sure, of course.
    
    marshfarm 11 hours ago
    
    You can start here, sports and cinema combined create the syntax of language as a code.
    https://www.routledge.com/The-Constraints-Led-Approach-Princ...

adamthegoalie 2 days ago

ChatGPT 5-Pro, What would it be like if we used trinary instead of binary computers? https://chatgpt.com/s/t_68f53bb9b15c8191b8d732f722243719