Numerous smart contract platforms (e.g. Ethereum 2.0, Polkadot, Dfinity, Near Protocol, Algorand, Kadena, Spacemesh, and Solana) are launching in the next year or two. Each team is pursuing a unique scaling strategy.
Most of these approaches, however, do not address one of the fundamental problems with distributed computing systems in Byzantine environments: the clock problem. In order to come to consensus, at least 51% of the machines in the network must execute the same transactions in the same order at the same time. In order to do that, the machines need to agree on a global clock. The “clock problem” is the challenge in getting many distrusting machines to agree on a global clock in Byzantine settings. Once everyone agrees on a global clock, transaction ordering becomes a much simpler problem because each transaction is timestamped using the same global clock.
The clock problem has manifested in other large scale networks before the modern crypto era, most notably in the wireless communications world. Cell towers must support tens of thousands of cell phones at once. There wasn’t enough bandwidth to give each phone its own radio frequency to transmit on, so telecoms needed “multiple access technologies” to cram multiple phone calls on the same frequency.
Code-Division Multiple Access (CDMA) was invented during WWII. In order to deal with the clock problem, CDMA requires each phone to encrypt their data with a unique key and transmit across several frequencies at the same time as other phones, relying on the cell tower to divide the combined signal into its individual calls. The efficiency of this schema improves only as fast as the complexity of the encryption schema. For widely available networks that must support cheap end devices, this rate of improvement has historically been slow and steady.
Telecommunication companies have achieved much faster efficiency gains by implementing Time Division Multiple Access (TDMA), which has become the standard solution to dealing with the clock problem since the advent of 2G cell networks. TDMA specifies that towers divide each radio frequency into time slots, and allocate these time slots to each phone call. In this manner, the cell tower provides a globally available clock for the network. This massively increases scalability of limited bandwidth by letting each frequency support multiple, simultaneous data channels, and decreasing interference from multiple phones broadcasting on the same frequency at the same time.
In this essay, I’ll explore how different blockchains deal with the time problem in Byzantine settings. I’ll conclude by arguing that the blockchain that builds the most effective clock will successfully separate time and state and be able to scale to support the throughput of millions of users in a safe and decentralized way.
Decentralized Consensus with a Clock
Google’s Spanner database is one of the most performant globally distributed databases in the world with 18 instances all processing transactions in lockstep. It supports 50,000+ transactions per second (TPS) with time to finality (TTF) under 1 second. Spanner leverages the Paxos consensus algorithm, which was first published in 1989. Spanner is a permissioned, trusted database. Paxos allows Spanner to progress in the face of power outages, server failures, malicious bugs, and numerous other faults.
How does Paxos achieve such performance when the highest throughput blockchain today struggles to achieve 5,000+ TPS with just 21 instances? Google pays full-time engineers to be onsite and regularly synchronize atomic clocks at each datacenter to a very high precision. Providing a globally available, trusted clock allows transactions to be timestamped so that each instance can receive transactions out of order but process them in the right order. This is a separation of time and state, as each instance updates its state without having to check with its peers to ensure they are performing the same actions in the same order.
What can we learn from Spanner? If there is a globally available clock in a non-byzantine setting, coming to consensus is trivial. Unfortunately, smart contract platforms today have two additional constraints that Spanner does not:
- Becoming a validator must be permissionless to keep the platform censorship-resistant
- Blockchains must keep users’ funds safe even if up to ⅓ of validators are malicious
If anybody can spin up a validator instance anywhere across the world, consensus algorithms must be designed to accommodate different hardware and network configurations and must manage malicious validators. Additionally, in order to be truly censorship-resistant, no out-of-band information can be trusted (aka the Oracle problem).
20 years after the invention of Paxos, someone figured out how to get a permissionless network of computers to come to consensus on a canonical ordering of transactions. That somebody was Satoshi Nakamoto and the solution was Proof-of-Work (PoW) consensus.
Proof of Work + Timechain = Clock
Tellingly, Satoshi’s pre-release Bitcoin code actually called the familiar blockchain data structure “the timechain” instead. This timechain is designed to tick every 10 minutes on average (via an ingenious entanglement of Proofs of Work, the difficulty adjustment, and the longest chain rule, which is beyond the scope of this essay), in which each tick comes in the form of a block of transactions that update the global state. After a node executes a block of transactions, it’s locked from making any state updates until it produces a valid new block of its own, or receives a valid new block from the network. In PoW, time and state are coupled together, always progressing in unison. Time cannot progress without an update to state.
What makes a block “valid” is a hotly debated subject. Transaction formats and block sizes are just two of the many qualities that are taken into account. One aspect of a valid block that is not up for debate, however, is that it must contain the hash of a previous block so that the network knows to place it after that previous block in the timechain.
Image: Each block in a blockchain includes the hash of the previous block as proof it came after.
The purpose of the timechain is to address requirement #2 above: Becoming a validator must be permissionless. The only way to validate that the current state of the Bitcoin network is valid is to start from the state in the genesis block and execute every transaction from genesis to the current state. The timechain provides the audit trail for a new validator to follow by proving that the transactions in the block at height 12 occurred and must be executed after the transactions in the block at height 11. Because block 12 has to include the hash of block 11, block 12 could have only been created after block 11. This timechain of hashes produces a logical, monotonic—though irregular and not very granular—clock that any validator in the network can independently verify without any out-of-band information.
Producing this globally available and trusted clock in an open, permissionless setting is Satoshi’s greatest innovation. Because the global state is locked until the global clock ticks again with a new block, the math for scalability is simple:
Throughput [TPS] = Block size [txs per block] / Block time [seconds per block]
In order to increase throughput, the protocol must either increase the block size, or decrease the block time. Increasing the block size harms the decentralization of block producers, and decreasing the block time increases the probability of chain forks.
Because time and state are coupled, there is no way around this.
Returning to the wireless communications example, let’s compare this problem to CDMA. In CDMA, radio towers have a fixed bandwidth of frequencies they can listen to, which is analogous to how block producers have a fixed block size they can process. Increasing scalability in CDMA means creating more complicated encoding schemes to fit more phone calls within that limited bandwidth. This is analogous to Segwit, Lightning channels, and Schnorr signatures, which are more complex encoding schemes that enable improved performance.
Bitcoin has 1 MB blocks, a 600-second block time, and a minimum transaction size of 250 B, which amounts to a theoretical maximum throughput of 7 TPS. That is ~7,000x less throughput and 3,600x slower TTF (since it takes 6 block times to achieve probabilistically irreversible finality) than Spanner.
Clearly, there is room for improvement.
Proof of Stake + Timechain = Faster Clock
The growth of Bitcoin sparked a renaissance in consensus algorithm research. The CAP Theorem tells us that, in the event of a network partition, a distributed database system must choose between consistency (a network stoppage) or availability (a network fork). Satoshi’s algorithm was the first permissionless, BFT consensus algorithm of the Nakamoto Consensus family, all of which choose availability over consistency. There are many consensus algorithms in the Nakamoto family.
Leslie Lamport’s Paxos algorithm was the first in the Classical Consensus family, which prefer consistency instead of availability.
In Paxos and many other algorithms from the Classical Consensus family, each node participating in consensus must synchronously communicate with every other validator in the network for each state update. This makes communication complexity O(n^2) (where n is the validator count), which means the time required between each state update grows exponentially as the validator count increases.
Jae Kwon and Ethan Buchman were the first to take the 20 years of Classical Consensus research and imbue it with a cryptoeconomic incentive structure called Bonded Proof of Stake (BPoS) to safely limit validator count. The result of their work was the first performant, permissionless, BFT consensus algorithm in the Classical Consensus family: Tendermint.
Like Nakamoto consensus, Tendermint bundles time and state updates, so its throughput only increases when block size increases or block time decreases. When Bitcoin was invented in 2009, a 10-minute block time was reasonable. However, bandwidth has grown exponentially since then, allowing Tendermint chains to shrink block times to just a few seconds.
Since forks are impossible because Tendermint prefers consistency, block time can be reduced until the limits of network throughput bottleneck system performance for a given validator count. Today, Tendermint allows the network to safely limit its validator count to 100, which has the nice benefit of filtering out nodes with poor bandwidth and allowing larger block sizes.
Tendermint is in production. The Cosmos Hub (the first live Tendermint instance) is running at a 6 second block time with 150 kB blocks, which allows for a maximum throughput of 100 TPS (assuming 250 byte transactions). It is still only a few months old, however, and will mature quickly. Theoretically, a Tendermint network with 5-second block times and 5 MB blocks could reach 4,000 TPS with minimal sacrifices in censorship-resistance and permissionlessness in comparison to Bitcoin—that’s a 570x increase in throughput and a 720x decrease in TTF!
Unfortunately, due to the synchronous nature of Classical Consensus algorithms, matching Spanner would adversely impact the censorship-resistance and permissionlessness properties of the system. Bigger blocks will inevitably take longer to propagate through the network, and longer for validators to verify, putting a floor on how low block times can get. To increase the speed of the clock, the number of validators would need to decrease significantly, and they would all need to be directly connected to the same fiber network. This would increase both the potential for validator collusion and the barrier to entry for new validators, plus make the operator of the fiber network a point of centralization.
The next evolution in blockchain consensus takes an important step to separating time and state, gaining huge throughput boosts but at a rather high cost.
Sharding + Timechain = Independent Clocks
With BPoS, Tendermint unbundled censorship-resistance from the number of validators, which allowed the network’s clock to speed up from ticking every 600 seconds to every 5 seconds, unlocking massive performance improvements. However, between each tick of the clock, the entire global state is still locked to maintain a globally-consistent state.
One way to mitigate this problem is to shard the global state into a bunch of smaller pieces, each with their own independent clocks, that can progress independently of each other. As long as these shards don’t need to interact with each other, the performance of each shard remains unchanged and the aggregate throughput across all shards increases linearly with the number of shards.
Cosmos envisions many independent blockchain networks existing in parallel, able to transfer value across each other but mostly transacting within their own systems. If each network can process 4,000 TPS, and there are 13 different networks, the system as a whole can outperform Spanner at 52,000 TPS! However, there are two problems with this approach:
A Proof-of-Stake blockchain’s security is measured by the cost of acquiring 33% staked tokens and approving invalid transactions. If instead of a single token supply, there are 13 separate networks, the cost of acquiring 33% of a given network is substantially reduced. This is far less safe and significantly harms the value proposition of a blockchain, in which security is a function of network value.
TTF for inter-network transfers increases by at least 4x compared to intra-network transfers. The networks must communicate back and forth to synchronize their clocks and ensure that if Alice is sending tokens to Bob, that he successfully receives value on his network before her tokens are burned on her network.
Whereas Cosmos envisions a world with many sovereign networks that manage their own security, Ethereum 2.0, Polkadot, Algorand, Dfinity, and Near Protocol are building systems to address at least the shared security issue (#1 above). There are nuanced differences to each team’s approach but the basic architecture involves a single beacon chain that both provides a clock for the rest of the network and securely shuffles validators across shards so that they all share a common security pool. Like with Cosmos, increasing throughput is easy: just add more shards!
Image: Ethereum 2.0’s single chain and sharded state.
Unfortunately, the high TTF for inter-network transfers issue (#2 above) remains. Even though the beacon chain provides a global clock, each shard only syncs their local clock with the beacon chain’s periodically. In order for Alice to send tokens from shard A to Bob in shard B, validators in shard A must prove that they burned Alice’s tokens before validators in shard B will mint an equivalent amount of tokens for Bob. For Ethereum 2.0’s current design, this process will take 6 minutes, 60x as long as the inter-shard block time.
Although sharding can help, the fundamental scaling limitations are still predicated on the fact that time and state updates in each shard are coupled. Each shard is still bound by the same limits facing Tendermint regarding block sizes and block times.
Sharding is analogous to certain elements of TDMA; the state is divided into separate shards with their own clocks in a manner comparable to how cell towers divide up their bandwidth into separate radio frequencies and time slots. The benefits of this are clear, but not taken full advantage of, as evidenced by the inter-shard latency.
But, what if it’s possible to completely decouple time and state updates in a permissionless setting?
Separating Time and State
So far we have discussed how Satoshi created the timechain data structure to provide a trustless clock for the Bitcoin network; how Kwon and Buchman applied BPoS to Paxos to safely reduce the number of validators and speed up the Tendermint network clock; and how sharding the network into many pieces with independent clocks can greatly increase throughput (as long as inter-shard transactions are minimized). Through each of these progressions, however, we have noted that updates to state and time remained coupled, that state updates only occurred alongside ticks of the clock, and that this creates fundamental limits on the throughput and time to finality for censorship-resistant and permissionless computing networks.
Separating time and state requires a globally-available clock that is fast, precise, and trust minimized. With a clock like this, state updates can occur continuously and asynchronously like they do in Spanner. So long as everyone agrees on a global clock and transactions are time stamped, transactions can flow continuously across the network.
Solana has built a trust-minimized clock for their smart contract platform by separating the hash-based timechain from consensus on state updates. Instead of chaining hashes together every block, validators in the Solana network hash the hashes themselves continuously intra-block. This mechanism, called Proof of History (PoH), produces a globally-available, trust-minimized, granular timechain for all nodes in the network to synchronize with.
Diving into the mechanics of PoH is out of scope for this essay. For more details on how PoH works, please see Solana’s PoH documentation.
Image: How Proof of History weaves a standardized timestamp into the blockchain.
The presence of this independent timechain allows the leader to propagate timestamped transactions out to the committee as fast as they are received. The timestamping provides the canonical order instead of some arbitrary order determined by the block producer. Double spend attempts are now trivial to resolve, because the entire network is able to agree on which transaction came first.
This changes everything.
Instead of forcing validators to come to consensus every 6-600 seconds in order to verify the passage of time, validators in Solana can continuously stream state updates to their peers in real time.
Instead of needing to wait to hear confirmation from each validator like in all other blockchains, Solana can use a novel fan-out mechanism called Turbine (inspired by BitTorrent) to keep communication complexity O(log(n)) instead of O(n^2). This allows Solana to process > 50,000 TPS on a single global state with fast finality—with no sharding required.
This means the validator pool size is comparable to Tendermint, on the order of 100-1,000, but that chain forks are allowed. An aggressive fork management policy is required to ensure that the system converges rapidly on a single chain whenever a chain fork does present itself, a necessary tradeoff for asynchronous progression and constant availability.
Bringing the wireless communications analogy full circle, PoH is to blockchains what TDMA is to cellular networks. Picture Solana’s 1,000 validators as radio towers leveraging their synchronized clocks to subdivide their bandwidth into individual time slots. They are constantly receiving new transactions, each with a signed PoH hash attached by the sender, and forwarding them to their neighbors who can immediately order them using those PoH hashes. As leaders rotate based on the global clock, each leader chooses an ordered set of transactions to execute and then gossips that “entry” to the network. Validators return their votes on each “entry” and finalize transactions when they see that ⅔ of the validators have voted alongside them.
The network as a whole is constantly processing transactions in the same order at incredibly high volumes, but each validator is progressing independently. This is a subtle but profound change relative to other chains. In Solana, validators never stop progressing. They always move forward independently, regardless of network conditions and consensus.
There are other tangential issues (e.g., rapid chain growth, a new programming model, the unbiasability of the timechain, parallelism) this new design brings up that are beyond the scope of this essay, though they are addressed in Solana’s documentation. Solana’s testnets today are processing upwards of 50,000 TPS with an average TTF of 1.5 seconds on networks of 200 validators across 5 continents. That is comparable to Spanner, except that it’s meaningfully decentralized.
This level of performance in a trust-minimized, permissionless world computer is only possible because of Solana’s separation of time and state. The Solana network’s globally available clock allows each node to update its state without having to communicate with any other nodes, just like Spanner.
Although the crypto community has written ad nauseam about scalability and consensus models, none have explored the distributed clock problem specifically. With several years of Proof-of-Stake research culminating in Tendermint + BPoS as the best-in-class result, and numerous sharding proposals all essentially converging around the beacon chain + state sharding architecture, a granular timechain that allows for asynchronous state progression will provide the best performance for unsharded systems that prefer availability to consistency.
Providing a globally available clock allows the Solana team to leverage 40+ years of distributed systems research that is otherwise unapplicable. Concepts like Optimistic Concurrency Control were invented in 1981, and have been used in large-scale computing projects for years, but cannot be applied when time and state must progress together. Parallel processing with GPUs has been around since 1995, though largely limited to graphics cards until Nvidia released the CUDA development environment in 2007, but cannot be fully leveraged by blockchain-based systems that pessimistically lock all states except for the accounts currently being touched by a transaction.
Understanding the passage of time is paramount to the performance of distributed systems in both permissioned and permissionless settings. Time is everything, and with a new way to encode the passage of time in the form of Proof of History, permissionless systems can match the performance of proven, centralized cloud compute providers.
文 Ryan Gentry 译 June Chan
Throughput [TPS] = Block size [txs per block] / Block time [seconds per block]
吞吐量[TPS] =区块大小[每区块txs] /区块时间[每块秒]
比特币有1 MB的区块，区块时间为600秒，最小交易大小为250 B，理论上最大吞吐量为7 TPS。这比Spanner的吞吐量低了约7000倍，比它的TTF慢了3600倍（因为它需要6个区块时间来实现概率不可逆的最终性）。
Jae Kwon和Ethan Buchman率先从事经典共识研究，为之倾注了20年的时间，他们将一种名为“绑定权益证明”(BPoS)的加密经济激励结构注入其中，以安全地限制验证器的数量。他们所取得的工作成果，便是经典共识家族中的第一个高性能、无许可的BFT共识算法：Tendermint。
Tendermint正在生产中。Cosmos Hub（第一个实时Tendermint实例）以6秒的区块时间运行，区块大小为150 kB，最大吞吐量为100 TPS（假设事务为250字节）。它还只有几个月的历史，不过很快就会成熟。从理论上讲，拥有5 秒出块时间和5 MB区块大小的Tendermint网络可以达到4000 TPS吞吐量，而与比特币相比，它在抗审查性和无许可性方面所做出的牺牲最小——这意味着吞吐量增加了570倍，TTF减少了720倍！
1) 权益证明区块链的安全性是通过获得33%的抵押代币和批准无效交易的成本来衡量的。如果不是单一的代币供应，而是有13个单独的网络，那么获得给定网络33%的份额的成本将大大降低。这远谈不上安全，而且严重损害了区块链的价值主张，其中安全是网络价值的一个功能。 2) 与网络内传输相比，网络间传输的TTF至少增加4倍。网络必须来回通信以同步它们的时 钟，并确保如果Alice向Bob发送代币，需要在网络上烧掉Alice的代币之前，令Bob成功在他的网络上接收到价值。
通过将基于哈希的时间链与对状态更新的共识中分离，Solana已经为他们的智能合约平台构建了一个信任最小化的时钟。Solana网络中的验证器不是将哈希链接到每个块上，而是在块内连续地散列哈希本身。这种机制称为历史证明(Proof of History, PoH)，它为网络中要同步的所有节点生成一个全局可用、信任最小化的细粒度时间链。 深入研究PoH的机制超出了本文的范围。有关PoH如何运作的更多细节，请参阅Solana的PoH文档。
Solana不需要像其他所有区块链一样等待每个验证器的确认，而是可以使用一种名为Turbine的新型扇出机制（灵感来自BitTorrent）来保持通信复杂度，即复杂度为O(log(n))，而不是O(n2)。这使得Solana可以在单个全局状态下处理超过 50,000 TPS，且能实现快速最终性，而且还不需要分片。
尽管加密社区已经就可伸缩性和一致性模型撰写了连篇累牍的文章，但是没有人专门研究分布式时钟的问题。几年来对权益证明的研究，带来的最佳成果是Tendermint + BpoS，无数分片方案基本上都围绕着信标链+状态分片架构展开，而允许异步状态进展的细粒度时间链将为喜欢可用性而不是一致性的非分片系统提供最佳性能。
提供一个全局可用的时钟，这使得Solana团队能善加利用40多年来的分布式系统研究。乐观并发控制（又名“乐观锁”，Optimistic Concurrency Control，简称“OCC”）等概念是1981年发明的，多年来一直应用于大型计算项目中，但在时间和状态必须同时进行时则无法应用。与GPU的并行处理从1995年开始就已经存在，不过在2007年英伟达(Nvidia)发布CUDA开发环境之前主要局限于图形卡，无法被基于区块链的系统充分利用，这类系统悲观地锁定除当前正在处理事务的账户以外的一切状态。