writing

The Separation of Time and State

By Ryan Gentry

July 16, 2019 | 18 minute read

Numerous smart contract platforms (e.g. Ethereum 2.0, Polkadot, Dfinity, Near Protocol, Algorand, Kadena, Spacemesh, and Solana) are launching in the next year or two. Each team is pursuing a unique scaling strategy.

Most of these approaches, however, do not address one of the fundamental problems with distributed computing systems in Byzantine environments: the clock problem. In order to come to consensus, at least 51% of the machines in the network must execute the same transactions in the same order at the same time. In order to do that, the machines need to agree on a global clock. The “clock problem” is the challenge in getting many distrusting machines to agree on a global clock in Byzantine settings. Once everyone agrees on a global clock, transaction ordering becomes a much simpler problem because each transaction is timestamped using the same global clock.

The clock problem has manifested in other large scale networks before the modern crypto era, most notably in the wireless communications world. Cell towers must support tens of thousands of cell phones at once. There wasn’t enough bandwidth to give each phone its own radio frequency to transmit on, so telecoms needed “multiple access technologies” to cram multiple phone calls on the same frequency.

Code-Division Multiple Access (CDMA) was invented during WWII. In order to deal with the clock problem, CDMA requires each phone to encrypt their data with a unique key and transmit across several frequencies at the same time as other phones, relying on the cell tower to divide the combined signal into its individual calls. The efficiency of this schema improves only as fast as the complexity of the encryption schema. For widely available networks that must support cheap end devices, this rate of improvement has historically been slow and steady.

Telecommunication companies have achieved much faster efficiency gains by implementing Time Division Multiple Access (TDMA), which has become the standard solution to dealing with the clock problem since the advent of 2G cell networks. TDMA specifies that towers divide each radio frequency into time slots, and allocate these time slots to each phone call. In this manner, the cell tower provides a globally available clock for the network. This massively increases scalability of limited bandwidth by letting each frequency support multiple, simultaneous data channels, and decreasing interference from multiple phones broadcasting on the same frequency at the same time.

In this essay, I’ll explore how different blockchains deal with the time problem in Byzantine settings. I’ll conclude by arguing that the blockchain that builds the most effective clock will successfully separate time and state and be able to scale to support the throughput of millions of users in a safe and decentralized way.

Decentralized Consensus with a Clock

Google’s Spanner database is one of the most performant globally distributed databases in the world with 18 instances all processing transactions in lockstep. It supports 50,000+ transactions per second (TPS) with time to finality (TTF) under 1 second. Spanner leverages the Paxos consensus algorithm, which was first published in 1989. Spanner is a permissioned, trusted database. Paxos allows Spanner to progress in the face of power outages, server failures, malicious bugs, and numerous other faults.

How does Paxos achieve such performance when the highest throughput blockchain today struggles to achieve 5,000+ TPS with just 21 instances? Google pays full-time engineers to be onsite and regularly synchronize atomic clocks at each datacenter to a very high precision. Providing a globally available, trusted clock allows transactions to be timestamped so that each instance can receive transactions out of order but process them in the right order. This is a separation of time and state, as each instance updates its state without having to check with its peers to ensure they are performing the same actions in the same order.

What can we learn from Spanner? If there is a globally available clock in a non-byzantine setting, coming to consensus is trivial. Unfortunately, smart contract platforms today have two additional constraints that Spanner does not:

Becoming a validator must be permissionless to keep the platform censorship-resistant
Blockchains must keep users’ funds safe even if up to ⅓ of validators are malicious

If anybody can spin up a validator instance anywhere across the world, consensus algorithms must be designed to accommodate different hardware and network configurations and must manage malicious validators. Additionally, in order to be truly censorship-resistant, no out-of-band information can be trusted (aka the Oracle problem).

20 years after the invention of Paxos, someone figured out how to get a permissionless network of computers to come to consensus on a canonical ordering of transactions. That somebody was Satoshi Nakamoto and the solution was Proof-of-Work (PoW) consensus.

Proof of Work + Timechain = Clock

Tellingly, Satoshi’s pre-release Bitcoin code actually called the familiar blockchain data structure “the timechain” instead. This timechain is designed to tick every 10 minutes on average (via an ingenious entanglement of Proofs of Work, the difficulty adjustment, and the longest chain rule, which is beyond the scope of this essay), in which each tick comes in the form of a block of transactions that update the global state. After a node executes a block of transactions, it’s locked from making any state updates until it produces a valid new block of its own, or receives a valid new block from the network. In PoW, time and state are coupled together, always progressing in unison. Time cannot progress without an update to state.

What makes a block “valid” is a hotly debated subject. Transaction formats and block sizes are just two of the many qualities that are taken into account. One aspect of a valid block that is not up for debate, however, is that it must contain the hash of a previous block so that the network knows to place it after that previous block in the timechain.

Proof of Work + Time Chain = Clock

Image: Each block in a blockchain includes the hash of the previous block as proof it came after.

The purpose of the timechain is to address requirement #2 above: Becoming a validator must be permissionless. The only way to validate that the current state of the Bitcoin network is valid is to start from the state in the genesis block and execute every transaction from genesis to the current state. The timechain provides the audit trail for a new validator to follow by proving that the transactions in the block at height 12 occurred and must be executed after the transactions in the block at height 11. Because block 12 has to include the hash of block 11, block 12 could have only been created after block 11. This timechain of hashes produces a logical, monotonic—though irregular and not very granular—clock that any validator in the network can independently verify without any out-of-band information.

Producing this globally available and trusted clock in an open, permissionless setting is Satoshi’s greatest innovation. Because the global state is locked until the global clock ticks again with a new block, the math for scalability is simple:

Throughput [TPS] = Block size [txs per block] / Block time [seconds per block]

In order to increase throughput, the protocol must either increase the block size, or decrease the block time. Increasing the block size harms the decentralization of block producers, and decreasing the block time increases the probability of chain forks.

Because time and state are coupled, there is no way around this.

Returning to the wireless communications example, let’s compare this problem to CDMA. In CDMA, radio towers have a fixed bandwidth of frequencies they can listen to, which is analogous to how block producers have a fixed block size they can process. Increasing scalability in CDMA means creating more complicated encoding schemes to fit more phone calls within that limited bandwidth. This is analogous to Segwit, Lightning channels, and Schnorr signatures, which are more complex encoding schemes that enable improved performance.

Bitcoin has 1 MB blocks, a 600-second block time, and a minimum transaction size of 250 B, which amounts to a theoretical maximum throughput of 7 TPS. That is ~7,000x less throughput and 3,600x slower TTF (since it takes 6 block times to achieve probabilistically irreversible finality) than Spanner.

Clearly, there is room for improvement.

Proof of Stake + Timechain = Faster Clock

The growth of Bitcoin sparked a renaissance in consensus algorithm research. The CAP Theorem tells us that, in the event of a network partition, a distributed database system must choose between consistency (a network stoppage) or availability (a network fork). Satoshi’s algorithm was the first permissionless, BFT consensus algorithm of the Nakamoto Consensus family, all of which choose availability over consistency. There are many consensus algorithms in the Nakamoto family.

Leslie Lamport’s Paxos algorithm was the first in the Classical Consensus family, which prefer consistency instead of availability.

In Paxos and many other algorithms from the Classical Consensus family, each node participating in consensus must synchronously communicate with every other validator in the network for each state update. This makes communication complexity O(n^2) (where n is the validator count), which means the time required between each state update grows exponentially as the validator count increases.

Jae Kwon and Ethan Buchman were the first to take the 20 years of Classical Consensus research and imbue it with a cryptoeconomic incentive structure called Bonded Proof of Stake (BPoS) to safely limit validator count. The result of their work was the first performant, permissionless, BFT consensus algorithm in the Classical Consensus family: Tendermint.

Like Nakamoto consensus, Tendermint bundles time and state updates, so its throughput only increases when block size increases or block time decreases. When Bitcoin was invented in 2009, a 10-minute block time was reasonable. However, bandwidth has grown exponentially since then, allowing Tendermint chains to shrink block times to just a few seconds.

Since forks are impossible because Tendermint prefers consistency, block time can be reduced until the limits of network throughput bottleneck system performance for a given validator count. Today, Tendermint allows the network to safely limit its validator count to 100, which has the nice benefit of filtering out nodes with poor bandwidth and allowing larger block sizes.

Tendermint is in production. The Cosmos Hub (the first live Tendermint instance) is running at a 6 second block time with 150 kB blocks, which allows for a maximum throughput of 100 TPS (assuming 250 byte transactions). It is still only a few months old, however, and will mature quickly. Theoretically, a Tendermint network with 5-second block times and 5 MB blocks could reach 4,000 TPS with minimal sacrifices in censorship-resistance and permissionlessness in comparison to Bitcoin—that’s a 570x increase in throughput and a 720x decrease in TTF!

Unfortunately, due to the synchronous nature of Classical Consensus algorithms, matching Spanner would adversely impact the censorship-resistance and permissionlessness properties of the system. Bigger blocks will inevitably take longer to propagate through the network, and longer for validators to verify, putting a floor on how low block times can get. To increase the speed of the clock, the number of validators would need to decrease significantly, and they would all need to be directly connected to the same fiber network. This would increase both the potential for validator collusion and the barrier to entry for new validators, plus make the operator of the fiber network a point of centralization.

The next evolution in blockchain consensus takes an important step to separating time and state, gaining huge throughput boosts but at a rather high cost.

Sharding + Timechain = Independent Clocks

With BPoS, Tendermint unbundled censorship-resistance from the number of validators, which allowed the network’s clock to speed up from ticking every 600 seconds to every 5 seconds, unlocking massive performance improvements. However, between each tick of the clock, the entire global state is still locked to maintain a globally-consistent state.

One way to mitigate this problem is to shard the global state into a bunch of smaller pieces, each with their own independent clocks, that can progress independently of each other. As long as these shards don’t need to interact with each other, the performance of each shard remains unchanged and the aggregate throughput across all shards increases linearly with the number of shards.

Cosmos envisions many independent blockchain networks existing in parallel, able to transfer value across each other but mostly transacting within their own systems. If each network can process 4,000 TPS, and there are 13 different networks, the system as a whole can outperform Spanner at 52,000 TPS! However, there are two problems with this approach:

A Proof-of-Stake blockchain’s security is measured by the cost of acquiring 33% staked tokens and approving invalid transactions. If instead of a single token supply, there are 13 separate networks, the cost of acquiring 33% of a given network is substantially reduced. This is far less safe and significantly harms the value proposition of a blockchain, in which security is a function of network value.
TTF for inter-network transfers increases by at least 4x compared to intra-network transfers. The networks must communicate back and forth to synchronize their clocks and ensure that if Alice is sending tokens to Bob, that he successfully receives value on his network before her tokens are burned on her network.

Whereas Cosmos envisions a world with many sovereign networks that manage their own security, Ethereum 2.0, Polkadot, Algorand, Dfinity, and Near Protocol are building systems to address at least the shared security issue (#1 above). There are nuanced differences to each team’s approach but the basic architecture involves a single beacon chain that both provides a clock for the rest of the network and securely shuffles validators across shards so that they all share a common security pool. Like with Cosmos, increasing throughput is easy: just add more shards!

Sharding

Image: Ethereum 2.0’s single chain and sharded state.

Unfortunately, the high TTF for inter-network transfers issue (#2 above) remains. Even though the beacon chain provides a global clock, each shard only syncs their local clock with the beacon chain’s periodically. In order for Alice to send tokens from shard A to Bob in shard B, validators in shard A must prove that they burned Alice’s tokens before validators in shard B will mint an equivalent amount of tokens for Bob. For Ethereum 2.0’s current design, this process will take 6 minutes, 60x as long as the inter-shard block time.

Although sharding can help, the fundamental scaling limitations are still predicated on the fact that time and state updates in each shard are coupled. Each shard is still bound by the same limits facing Tendermint regarding block sizes and block times.

Sharding is analogous to certain elements of TDMA; the state is divided into separate shards with their own clocks in a manner comparable to how cell towers divide up their bandwidth into separate radio frequencies and time slots. The benefits of this are clear, but not taken full advantage of, as evidenced by the inter-shard latency.

But, what if it’s possible to completely decouple time and state updates in a permissionless setting?

Separating Time and State

So far we have discussed how Satoshi created the timechain data structure to provide a trustless clock for the Bitcoin network; how Kwon and Buchman applied BPoS to Paxos to safely reduce the number of validators and speed up the Tendermint network clock; and how sharding the network into many pieces with independent clocks can greatly increase throughput (as long as inter-shard transactions are minimized). Through each of these progressions, however, we have noted that updates to state and time remained coupled, that state updates only occurred alongside ticks of the clock, and that this creates fundamental limits on the throughput and time to finality for censorship-resistant and permissionless computing networks.

Separating time and state requires a globally-available clock that is fast, precise, and trust minimized. With a clock like this, state updates can occur continuously and asynchronously like they do in Spanner. So long as everyone agrees on a global clock and transactions are time stamped, transactions can flow continuously across the network.

Solana has built a trust-minimized clock for their smart contract platform by separating the hash-based timechain from consensus on state updates. Instead of chaining hashes together every block, validators in the Solana network hash the hashes themselves continuously intra-block. This mechanism, called Proof of History (PoH), produces a globally-available, trust-minimized, granular timechain for all nodes in the network to synchronize with.

Diving into the mechanics of PoH is out of scope for this essay. For more details on how PoH works, please see Solana’s PoH documentation.

Seperating Time and State

Image: How Proof of History weaves a standardized timestamp into the blockchain.

The presence of this independent timechain allows the leader to propagate timestamped transactions out to the committee as fast as they are received. The timestamping provides the canonical order instead of some arbitrary order determined by the block producer. Double spend attempts are now trivial to resolve, because the entire network is able to agree on which transaction came first.

This changes everything.

Instead of forcing validators to come to consensus every 6-600 seconds in order to verify the passage of time, validators in Solana can continuously stream state updates to their peers in real time.

Instead of needing to wait to hear confirmation from each validator like in all other blockchains, Solana can use a novel fan-out mechanism called Turbine (inspired by BitTorrent) to keep communication complexity O(log(n)) instead of O(n^2). This allows Solana to process > 50,000 TPS on a single global state with fast finality—with no sharding required.

This means the validator pool size is comparable to Tendermint, on the order of 100-1,000, but that chain forks are allowed. An aggressive fork management policy is required to ensure that the system converges rapidly on a single chain whenever a chain fork does present itself, a necessary tradeoff for asynchronous progression and constant availability.

Bringing the wireless communications analogy full circle, PoH is to blockchains what TDMA is to cellular networks. Picture Solana’s 1,000 validators as radio towers leveraging their synchronized clocks to subdivide their bandwidth into individual time slots. They are constantly receiving new transactions, each with a signed PoH hash attached by the sender, and forwarding them to their neighbors who can immediately order them using those PoH hashes. As leaders rotate based on the global clock, each leader chooses an ordered set of transactions to execute and then gossips that “entry” to the network. Validators return their votes on each “entry” and finalize transactions when they see that ⅔ of the validators have voted alongside them.

The network as a whole is constantly processing transactions in the same order at incredibly high volumes, but each validator is progressing independently. This is a subtle but profound change relative to other chains. In Solana, validators never stop progressing. They always move forward independently, regardless of network conditions and consensus.

There are other tangential issues (e.g., rapid chain growth, a new programming model, the unbiasability of the timechain, parallelism) this new design brings up that are beyond the scope of this essay, though they are addressed in Solana’s documentation. Solana’s testnets today are processing upwards of 50,000 TPS with an average TTF of 1.5 seconds on networks of 200 validators across 5 continents. That is comparable to Spanner, except that it’s meaningfully decentralized.

This level of performance in a trust-minimized, permissionless world computer is only possible because of Solana’s separation of time and state. The Solana network’s globally available clock allows each node to update its state without having to communicate with any other nodes, just like Spanner.

Reframing Scalability

Although the crypto community has written ad nauseam about scalability and consensus models, none have explored the distributed clock problem specifically. With several years of Proof-of-Stake research culminating in Tendermint + BPoS as the best-in-class result, and numerous sharding proposals all essentially converging around the beacon chain + state sharding architecture, a granular timechain that allows for asynchronous state progression will provide the best performance for unsharded systems that prefer availability to consistency.

Providing a globally available clock allows the Solana team to leverage 40+ years of distributed systems research that is otherwise unapplicable. Concepts like Optimistic Concurrency Control were invented in 1981, and have been used in large-scale computing projects for years, but cannot be applied when time and state must progress together. Parallel processing with GPUs has been around since 1995, though largely limited to graphics cards until Nvidia released the CUDA development environment in 2007, but cannot be fully leveraged by blockchain-based systems that pessimistically lock all states except for the accounts currently being touched by a transaction.

Understanding the passage of time is paramount to the performance of distributed systems in both permissioned and permissionless settings. Time is everything, and with a new way to encode the passage of time in the form of Proof of History, permissionless systems can match the performance of proven, centralized cloud compute providers.

Thanks to Anatoly Yakovenko and Zaki Manian for providing feedback on this essay.