The Separation of Time and State

By Ryan Gentry

July 16, 2019 | 18 minute read

Numerous smart contract platforms (e.g. Ethereum 2.0, Polkadot, Dfinity, Near Protocol, Algorand, Kadena, Spacemesh, and Solana) are launching in the next year or two. Each team is pursuing a unique scaling strategy.

Most of these approaches, however, do not address one of the fundamental problems with distributed computing systems in Byzantine environments: the clock problem. In order to come to consensus, at least 51% of the machines in the network must execute the same transactions in the same order at the same time. In order to do that, the machines need to agree on a global clock. The “clock problem” is the challenge in getting many distrusting machines to agree on a global clock in Byzantine settings. Once everyone agrees on a global clock, transaction ordering becomes a much simpler problem because each transaction is timestamped using the same global clock.

The clock problem has manifested in other large scale networks before the modern crypto era, most notably in the wireless communications world. Cell towers must support tens of thousands of cell phones at once. There wasn’t enough bandwidth to give each phone its own radio frequency to transmit on, so telecoms needed “multiple access technologies” to cram multiple phone calls on the same frequency.

Code-Division Multiple Access (CDMA) was invented during WWII. In order to deal with the clock problem, CDMA requires each phone to encrypt their data with a unique key and transmit across several frequencies at the same time as other phones, relying on the cell tower to divide the combined signal into its individual calls. The efficiency of this schema improves only as fast as the complexity of the encryption schema. For widely available networks that must support cheap end devices, this rate of improvement has historically been slow and steady.

Telecommunication companies have achieved much faster efficiency gains by implementing Time Division Multiple Access (TDMA), which has become the standard solution to dealing with the clock problem since the advent of 2G cell networks. TDMA specifies that towers divide each radio frequency into time slots, and allocate these time slots to each phone call. In this manner, the cell tower provides a globally available clock for the network. This massively increases scalability of limited bandwidth by letting each frequency support multiple, simultaneous data channels, and decreasing interference from multiple phones broadcasting on the same frequency at the same time.

In this essay, I’ll explore how different blockchains deal with the time problem in Byzantine settings. I’ll conclude by arguing that the blockchain that builds the most effective clock will successfully separate time and state and be able to scale to support the throughput of millions of users in a safe and decentralized way.

Decentralized Consensus with a Clock

Google’s Spanner database is one of the most performant globally distributed databases in the world with 18 instances all processing transactions in lockstep. It supports 50,000+ transactions per second (TPS) with time to finality (TTF) under 1 second. Spanner leverages the Paxos consensus algorithm, which was first published in 1989. Spanner is a permissioned, trusted database. Paxos allows Spanner to progress in the face of power outages, server failures, malicious bugs, and numerous other faults.

How does Paxos achieve such performance when the highest throughput blockchain today struggles to achieve 5,000+ TPS with just 21 instances? Google pays full-time engineers to be onsite and regularly synchronize atomic clocks at each datacenter to a very high precision. Providing a globally available, trusted clock allows transactions to be timestamped so that each instance can receive transactions out of order but process them in the right order. This is a separation of time and state, as each instance updates its state without having to check with its peers to ensure they are performing the same actions in the same order.

What can we learn from Spanner? If there is a globally available clock in a non-byzantine setting, coming to consensus is trivial. Unfortunately, smart contract platforms today have two additional constraints that Spanner does not:

  1. Becoming a validator must be permissionless to keep the platform censorship-resistant
  2. Blockchains must keep users’ funds safe even if up to ⅓ of validators are malicious

If anybody can spin up a validator instance anywhere across the world, consensus algorithms must be designed to accommodate different hardware and network configurations and must manage malicious validators. Additionally, in order to be truly censorship-resistant, no out-of-band information can be trusted (aka the Oracle problem).

20 years after the invention of Paxos, someone figured out how to get a permissionless network of computers to come to consensus on a canonical ordering of transactions. That somebody was Satoshi Nakamoto and the solution was Proof-of-Work (PoW) consensus.

Proof of Work + Timechain = Clock

Tellingly, Satoshi’s pre-release Bitcoin code actually called the familiar blockchain data structure “the timechain” instead. This timechain is designed to tick every 10 minutes on average (via an ingenious entanglement of Proofs of Work, the difficulty adjustment, and the longest chain rule, which is beyond the scope of this essay), in which each tick comes in the form of a block of transactions that update the global state. After a node executes a block of transactions, it’s locked from making any state updates until it produces a valid new block of its own, or receives a valid new block from the network. In PoW, time and state are coupled together, always progressing in unison. Time cannot progress without an update to state.

What makes a block “valid” is a hotly debated subject. Transaction formats and block sizes are just two of the many qualities that are taken into account. One aspect of a valid block that is not up for debate, however, is that it must contain the hash of a previous block so that the network knows to place it after that previous block in the timechain.

Proof of Work + Time Chain = Clock

Image: Each block in a blockchain includes the hash of the previous block as proof it came after.

The purpose of the timechain is to address requirement #2 above: Becoming a validator must be permissionless. The only way to validate that the current state of the Bitcoin network is valid is to start from the state in the genesis block and execute every transaction from genesis to the current state. The timechain provides the audit trail for a new validator to follow by proving that the transactions in the block at height 12 occurred and must be executed after the transactions in the block at height 11. Because block 12 has to include the hash of block 11, block 12 could have only been created after block 11. This timechain of hashes produces a logical, monotonic—though irregular and not very granular—clock that any validator in the network can independently verify without any out-of-band information.

Producing this globally available and trusted clock in an open, permissionless setting is Satoshi’s greatest innovation. Because the global state is locked until the global clock ticks again with a new block, the math for scalability is simple:

Throughput [TPS] = Block size [txs per block] / Block time [seconds per block]

In order to increase throughput, the protocol must either increase the block size, or decrease the block time. Increasing the block size harms the decentralization of block producers, and decreasing the block time increases the probability of chain forks.

Because time and state are coupled, there is no way around this.

Returning to the wireless communications example, let’s compare this problem to CDMA. In CDMA, radio towers have a fixed bandwidth of frequencies they can listen to, which is analogous to how block producers have a fixed block size they can process. Increasing scalability in CDMA means creating more complicated encoding schemes to fit more phone calls within that limited bandwidth. This is analogous to Segwit, Lightning channels, and Schnorr signatures, which are more complex encoding schemes that enable improved performance.

Bitcoin has 1 MB blocks, a 600-second block time, and a minimum transaction size of 250 B, which amounts to a theoretical maximum throughput of 7 TPS. That is ~7,000x less throughput and 3,600x slower TTF (since it takes 6 block times to achieve probabilistically irreversible finality) than Spanner.

Clearly, there is room for improvement.

Proof of Stake + Timechain = Faster Clock

The growth of Bitcoin sparked a renaissance in consensus algorithm research. The CAP Theorem tells us that, in the event of a network partition, a distributed database system must choose between consistency (a network stoppage) or availability (a network fork). Satoshi’s algorithm was the first permissionless, BFT consensus algorithm of the Nakamoto Consensus family, all of which choose availability over consistency. There are many consensus algorithms in the Nakamoto family.

Leslie Lamport’s Paxos algorithm was the first in the Classical Consensus family, which prefer consistency instead of availability.

In Paxos and many other algorithms from the Classical Consensus family, each node participating in consensus must synchronously communicate with every other validator in the network for each state update. This makes communication complexity O(n^2) (where n is the validator count), which means the time required between each state update grows exponentially as the validator count increases.

Jae Kwon and Ethan Buchman were the first to take the 20 years of Classical Consensus research and imbue it with a cryptoeconomic incentive structure called Bonded Proof of Stake (BPoS) to safely limit validator count. The result of their work was the first performant, permissionless, BFT consensus algorithm in the Classical Consensus family: Tendermint.

Like Nakamoto consensus, Tendermint bundles time and state updates, so its throughput only increases when block size increases or block time decreases. When Bitcoin was invented in 2009, a 10-minute block time was reasonable. However, bandwidth has grown exponentially since then, allowing Tendermint chains to shrink block times to just a few seconds.

Since forks are impossible because Tendermint prefers consistency, block time can be reduced until the limits of network throughput bottleneck system performance for a given validator count. Today, Tendermint allows the network to safely limit its validator count to 100, which has the nice benefit of filtering out nodes with poor bandwidth and allowing larger block sizes.

Tendermint is in production. The Cosmos Hub (the first live Tendermint instance) is running at a 6 second block time with 150 kB blocks, which allows for a maximum throughput of 100 TPS (assuming 250 byte transactions). It is still only a few months old, however, and will mature quickly. Theoretically, a Tendermint network with 5-second block times and 5 MB blocks could reach 4,000 TPS with minimal sacrifices in censorship-resistance and permissionlessness in comparison to Bitcoin—that’s a 570x increase in throughput and a 720x decrease in TTF!

Unfortunately, due to the synchronous nature of Classical Consensus algorithms, matching Spanner would adversely impact the censorship-resistance and permissionlessness properties of the system. Bigger blocks will inevitably take longer to propagate through the network, and longer for validators to verify, putting a floor on how low block times can get. To increase the speed of the clock, the number of validators would need to decrease significantly, and they would all need to be directly connected to the same fiber network. This would increase both the potential for validator collusion and the barrier to entry for new validators, plus make the operator of the fiber network a point of centralization.

The next evolution in blockchain consensus takes an important step to separating time and state, gaining huge throughput boosts but at a rather high cost.

Sharding + Timechain = Independent Clocks

With BPoS, Tendermint unbundled censorship-resistance from the number of validators, which allowed the network’s clock to speed up from ticking every 600 seconds to every 5 seconds, unlocking massive performance improvements. However, between each tick of the clock, the entire global state is still locked to maintain a globally-consistent state.

One way to mitigate this problem is to shard the global state into a bunch of smaller pieces, each with their own independent clocks, that can progress independently of each other. As long as these shards don’t need to interact with each other, the performance of each shard remains unchanged and the aggregate throughput across all shards increases linearly with the number of shards.

Cosmos envisions many independent blockchain networks existing in parallel, able to transfer value across each other but mostly transacting within their own systems. If each network can process 4,000 TPS, and there are 13 different networks, the system as a whole can outperform Spanner at 52,000 TPS! However, there are two problems with this approach:

  1. A Proof-of-Stake blockchain’s security is measured by the cost of acquiring 33% staked tokens and approving invalid transactions. If instead of a single token supply, there are 13 separate networks, the cost of acquiring 33% of a given network is substantially reduced. This is far less safe and significantly harms the value proposition of a blockchain, in which security is a function of network value.

  2. TTF for inter-network transfers increases by at least 4x compared to intra-network transfers. The networks must communicate back and forth to synchronize their clocks and ensure that if Alice is sending tokens to Bob, that he successfully receives value on his network before her tokens are burned on her network.

Whereas Cosmos envisions a world with many sovereign networks that manage their own security, Ethereum 2.0, Polkadot, Algorand, Dfinity, and Near Protocol are building systems to address at least the shared security issue (#1 above). There are nuanced differences to each team’s approach but the basic architecture involves a single beacon chain that both provides a clock for the rest of the network and securely shuffles validators across shards so that they all share a common security pool. Like with Cosmos, increasing throughput is easy: just add more shards!

Sharding

Image: Ethereum 2.0’s single chain and sharded state.

Unfortunately, the high TTF for inter-network transfers issue (#2 above) remains. Even though the beacon chain provides a global clock, each shard only syncs their local clock with the beacon chain’s periodically. In order for Alice to send tokens from shard A to Bob in shard B, validators in shard A must prove that they burned Alice’s tokens before validators in shard B will mint an equivalent amount of tokens for Bob. For Ethereum 2.0’s current design, this process will take 6 minutes, 60x as long as the inter-shard block time.

Although sharding can help, the fundamental scaling limitations are still predicated on the fact that time and state updates in each shard are coupled. Each shard is still bound by the same limits facing Tendermint regarding block sizes and block times.

Sharding is analogous to certain elements of TDMA; the state is divided into separate shards with their own clocks in a manner comparable to how cell towers divide up their bandwidth into separate radio frequencies and time slots. The benefits of this are clear, but not taken full advantage of, as evidenced by the inter-shard latency.

But, what if it’s possible to completely decouple time and state updates in a permissionless setting?

Separating Time and State

So far we have discussed how Satoshi created the timechain data structure to provide a trustless clock for the Bitcoin network; how Kwon and Buchman applied BPoS to Paxos to safely reduce the number of validators and speed up the Tendermint network clock; and how sharding the network into many pieces with independent clocks can greatly increase throughput (as long as inter-shard transactions are minimized). Through each of these progressions, however, we have noted that updates to state and time remained coupled, that state updates only occurred alongside ticks of the clock, and that this creates fundamental limits on the throughput and time to finality for censorship-resistant and permissionless computing networks.

Separating time and state requires a globally-available clock that is fast, precise, and trust minimized. With a clock like this, state updates can occur continuously and asynchronously like they do in Spanner. So long as everyone agrees on a global clock and transactions are time stamped, transactions can flow continuously across the network.

Solana has built a trust-minimized clock for their smart contract platform by separating the hash-based timechain from consensus on state updates. Instead of chaining hashes together every block, validators in the Solana network hash the hashes themselves continuously intra-block. This mechanism, called Proof of History (PoH), produces a globally-available, trust-minimized, granular timechain for all nodes in the network to synchronize with.

Diving into the mechanics of PoH is out of scope for this essay. For more details on how PoH works, please see Solana’s PoH documentation.

Seperating Time and State

Image: How Proof of History weaves a standardized timestamp into the blockchain.

The presence of this independent timechain allows the leader to propagate timestamped transactions out to the committee as fast as they are received. The timestamping provides the canonical order instead of some arbitrary order determined by the block producer. Double spend attempts are now trivial to resolve, because the entire network is able to agree on which transaction came first.

This changes everything.

Instead of forcing validators to come to consensus every 6-600 seconds in order to verify the passage of time, validators in Solana can continuously stream state updates to their peers in real time.

Instead of needing to wait to hear confirmation from each validator like in all other blockchains, Solana can use a novel fan-out mechanism called Turbine (inspired by BitTorrent) to keep communication complexity O(log(n)) instead of O(n^2). This allows Solana to process > 50,000 TPS on a single global state with fast finality—with no sharding required.

This means the validator pool size is comparable to Tendermint, on the order of 100-1,000, but that chain forks are allowed. An aggressive fork management policy is required to ensure that the system converges rapidly on a single chain whenever a chain fork does present itself, a necessary tradeoff for asynchronous progression and constant availability.

Bringing the wireless communications analogy full circle, PoH is to blockchains what TDMA is to cellular networks. Picture Solana’s 1,000 validators as radio towers leveraging their synchronized clocks to subdivide their bandwidth into individual time slots. They are constantly receiving new transactions, each with a signed PoH hash attached by the sender, and forwarding them to their neighbors who can immediately order them using those PoH hashes. As leaders rotate based on the global clock, each leader chooses an ordered set of transactions to execute and then gossips that “entry” to the network. Validators return their votes on each “entry” and finalize transactions when they see that ⅔ of the validators have voted alongside them.

The network as a whole is constantly processing transactions in the same order at incredibly high volumes, but each validator is progressing independently. This is a subtle but profound change relative to other chains. In Solana, validators never stop progressing. They always move forward independently, regardless of network conditions and consensus.

There are other tangential issues (e.g., rapid chain growth, a new programming model, the unbiasability of the timechain, parallelism) this new design brings up that are beyond the scope of this essay, though they are addressed in Solana’s documentation. Solana’s testnets today are processing upwards of 50,000 TPS with an average TTF of 1.5 seconds on networks of 200 validators across 5 continents. That is comparable to Spanner, except that it’s meaningfully decentralized.

This level of performance in a trust-minimized, permissionless world computer is only possible because of Solana’s separation of time and state. The Solana network’s globally available clock allows each node to update its state without having to communicate with any other nodes, just like Spanner.

Reframing Scalability

Although the crypto community has written ad nauseam about scalability and consensus models, none have explored the distributed clock problem specifically. With several years of Proof-of-Stake research culminating in Tendermint + BPoS as the best-in-class result, and numerous sharding proposals all essentially converging around the beacon chain + state sharding architecture, a granular timechain that allows for asynchronous state progression will provide the best performance for unsharded systems that prefer availability to consistency.

Providing a globally available clock allows the Solana team to leverage 40+ years of distributed systems research that is otherwise unapplicable. Concepts like Optimistic Concurrency Control were invented in 1981, and have been used in large-scale computing projects for years, but cannot be applied when time and state must progress together. Parallel processing with GPUs has been around since 1995, though largely limited to graphics cards until Nvidia released the CUDA development environment in 2007, but cannot be fully leveraged by blockchain-based systems that pessimistically lock all states except for the accounts currently being touched by a transaction.

Understanding the passage of time is paramount to the performance of distributed systems in both permissioned and permissionless settings. Time is everything, and with a new way to encode the passage of time in the form of Proof of History, permissionless systems can match the performance of proven, centralized cloud compute providers.

Thanks to Anatoly Yakovenko and Zaki Manian for providing feedback on this essay.


时间与状态之分离

Ryan Gentry 译 June Chan

未来一两年间将推出许多智能合约平台(如以太坊2.0、Polkadot、Dfinity、Near Protocol、Algorand、Kadena、Spacemesh和Solana)。每个团队都在追求独特的扩展策略。

然而,大多数这些方法都没有解决拜占庭环境中分布式计算系统的一个基本问题:时钟问题。为了达成共识,网络中至少51%的机器必须在同一时间以相同的顺序执行相同的事务。想要实现这一点,这些机器需要就一个全局时钟达成一致。“时钟问题”指的是让许多不信任的机器在拜占庭式的设置下就全局时钟达成一致所面临的挑战。一旦每个人都同意一个全局时钟,事务排序就会变得简单得多,因为每个事务都使用相同的全局时钟分配时间戳。

在现代加密时代之前,时钟问题已经在其他大规模网络中彰显了出来,尤其是在无线通信领域。手机信号发射塔必须同时支持成千上万的手机。由于没有足够的带宽让每部手机都按自己的无线频率来传输,因此电信行业需要使用“多重接入技术”,好在同一频率上塞进多个电话呼叫。

码分多址(CDMA)是在二战期间发明的。为了解决时钟问题,CDMA要求每台手机用一个唯一的密钥加密数据,并像其他手机一样同时传输多个频率的数据,依靠发射塔将组合的信号划分为各自单独的通话。此模式的效率只会随着加密模式复杂性的提高而同步改进。对于必须支持廉价终端设备的广泛可用网络,这类改进的速度历来缓慢而稳定。

自2G蜂窝网络出现以来,时分多址(TDMA)已成为解决时钟问题的标准解决方案,电信公司藉此获得了更快的效率收益。TDMA指定发射塔将每个无线电频率划分为时间段,并将这些时间段分配给每个电话呼叫。通过这种方式,蜂窝塔为网络提供了一个全局可用的时钟。这极大地提高了有限带宽的可扩展性,使得每个频率可支持多个同步数据通道,并减少来自多个电话在同一频率上同时广播的干扰。

在本文中,我将探讨不同的区块链如何在拜占庭式的设置下处理时间问题。我的结论是,构建最有效时钟的区块链将成功地分离时间和状态,并能够进行扩展,以安全和去中心化的方式支持数百万用户的吞吐量。

·去中心化共识与时钟

谷歌的Spanner数据库是全局性能最好的分布式数据库之一,拥有18个实例,所有的事务处理都是同步的。它支持每秒50,000+个事务(TPS),最终性时间(TTF)控制在1秒内。Spanner利用了于1989年首次发布的Paxos共识算法。Spanner是一个许可式的可信数据库。而Paxos算法使得Spanner在面对断电、服务器故障、恶意bug等许多其他错误时仍能取得进展。

在当今吞吐量最高的区块链仅用21个实例就难以实现5,000+ TPS时,Paxos是如何实现这种性能的?谷歌会聘请全职工程师来到现场,定期以极高的精度同步每个数据中心的原子钟。提供一个全局可用的可信时钟,可以对事务分配时间戳,以便每个实例接收到无序的事务,但同时又能以正确的顺序处理它们。这就是时间和状态分离,因为每个实例都会更新其状态,而无需与其他实例进行检查,以确保它们以相同的顺序执行相同的操作。

我们能从Spanner中学到什么?如果在非拜占庭式的环境中有一个全局可用的时钟,那么达成共识是小菜一碟。遗憾的是,如今的智能合约平台有两个Spanner没有的额外限制:

  1. 成为一个验证器必须是无需许可的,以保持平台的抗审查性
  2. 区块链必须保护用户的资金安全,哪怕高达⅓的验证器是有恶意的

如果任何人都可以在全世界任何地方启动验证器实例,则必须设计共识算法来适应不同的硬件和网络配置,并必须管理恶意验证器。此外,为了真正抵抗审查,不可信任各种带外信息(即预言机问题)。

Paxos共识算法发明20年后,有人想出了一个办法,让一个无需许可的计算机网络就交易的规范顺序达成共识。这个人就是中本聪(Satoshi Nakamoto),解决方案是工作证明(PoW)共识。

·工作证明+时间链=时钟

有意思的是,中本聪提前发布的比特币源代码实际上将人们熟悉的区块链数据结构称为“时间链”(Timechain)。这个时间链的设计是平均每10分钟滴答(tick)一次(通过巧妙地将工作证明、难度调整和最长链规则整合在一起,这超出了本文的范围),其中,每次滴答都以更新全局状态的事务块的形式出现。在节点执行一个事务块之后,它会被锁定,不能进行任何状态更新,直到它自己生成一个有效的新块,或者从网络接收到一个有效的新块。在PoW中,时间和状态是耦合在一起的,总是步调一致地前进。时间无法通过更新到状态进行处理。

什么能令区块“有效”,这个话题一直会引起非常激烈的争论。事务格式和区块大小只是需要考虑的众多特性当中的两个。不过,有效区块的其中一个方面不存在争议,那就是它必须包含前一个块的哈希,好让网络知道将它放在时间链中的前一个块之后。

Proof of Work + Time Chain = Clock 图说:区块链中的每个块都包含前一个块的哈希,以证明它在后面。

时间链的目的是解决上面的需求#2:成为一个验证器必须是无需许可的。验证比特币网络当前状态是否有效的唯一方法是从创世区块中的状态开始,执行从创世区块到当前状态的每一笔交易。通过证明区块高度为12的事务已经发生,并且必须在区块高度为11的事务之后执行,时间链为一个新的验证器提供了审计轨迹。因为第12区块必须包含第11区块的哈希,第12区块只能在第11区块之后创建。哈希的这个时间链产生一个合逻辑的、单调一致的(尽管不规则而且细粒度不够)时钟,网络中的任何验证器都可以在没有任何带外信息的情况下独立地验证这个时钟。

在一个开放的、无许可的环境中生产这种全局可用且可信的时钟,这是中本聪最大的创新。由于在出新块前,全局状态被锁定,因此可伸缩性的计算很简单:

Throughput [TPS] = Block size [txs per block] / Block time [seconds per block]

吞吐量[TPS] =区块大小[每区块txs] /区块时间[每块秒]

为了增加吞吐量,协议必须要么增加区块大小,要么减少区块时间。增加区块大小不利于区块生产者的去中心化,而减少区块时间又增加了链分叉的概率。

因为时间和状态是耦合的,没有办法绕过它。

回到无线通信的例子,让我们将这个问题与CDMA进行比较。在CDMA中,无线电发射塔有一个固定的频率带宽可以被听到,这类似于区块生产者有一个固定的区块大小可供其处理。提高CDMA的可伸缩性意味着要创建更复杂的编码方案,以便在有限的带宽内适应更多的电话呼叫。这类似于Segwit、闪电通道和Schnorr签名,作为更复杂的编码方案,它们均可以提高性能。

比特币有1 MB的区块,区块时间为600秒,最小交易大小为250 B,理论上最大吞吐量为7 TPS。这比Spanner的吞吐量低了约7000倍,比它的TTF慢了3600倍(因为它需要6个区块时间来实现概率不可逆的最终性)。

显然,比特币还有改进的空间。

·权益证明+时间链=更快的时钟

比特币的发展引发了共识算法研究的复兴。CAP定理告诉我们,在网络分区的情况下,分布式数据库系统必须在一致性(网络中断)和可用性(网络分叉)之间做出选择。中本聪共识算法 这个大家族体系中有许多共识算法,所有都选择可用性而不是一致性。中本聪本人的算法是这个共识算法家族中的第一个无许可的拜占庭容错共识算法。

相较中本聪家族,经典共识算法家族则更喜欢一致性而非可用性,其中之首是Leslie Lamport的Paxos算法。

在Paxos和许多其他经典共识家族的算法中,每个参与共识的节点必须与网络中的其他所有验证器同步通信,以进行每个状态更新。这使得通信复杂度为O(n2) (其中n是验证器计数),这意味着随着验证器计数的增加,每个状态更新之间所需的时间呈指数级增长。

Jae Kwon和Ethan Buchman率先从事经典共识研究,为之倾注了20年的时间,他们将一种名为“绑定权益证明”(BPoS)的加密经济激励结构注入其中,以安全地限制验证器的数量。他们所取得的工作成果,便是经典共识家族中的第一个高性能、无许可的BFT共识算法:Tendermint

与中本聪共识一样,Tendermint捆绑了时间和状态更新,因此只有当区块大小增加或区块时间减少时,它的吞吐量才会增加。2009年比特币问世时,10分钟的区块时间是合理的。然而,自那以后,带宽呈指数级增长,使得Tendermint链的区块时间缩短到了几秒钟。

因为Tendermint更喜欢一致性,不可能分叉,所以可以减少区块时间,直到某个给定验证器计数的网络吞吐量瓶颈系统性能达到极限为止。现在,Tendermint允许网络安全地将验证器数量限制在100个以内,这样做的好处是过滤掉带宽差的节点,并允许使用更大的区块。

Tendermint正在生产中。Cosmos Hub(第一个实时Tendermint实例)以6秒的区块时间运行,区块大小为150 kB,最大吞吐量为100 TPS(假设事务为250字节)。它还只有几个月的历史,不过很快就会成熟。从理论上讲,拥有5 秒出块时间和5 MB区块大小的Tendermint网络可以达到4000 TPS吞吐量,而与比特币相比,它在抗审查性和无许可性方面所做出的牺牲最小——这意味着吞吐量增加了570倍,TTF减少了720倍!

遗憾的是,由于经典共识算法的同步特性,匹配Spanner会对系统的抗审查性和无许可性产生不利影响。更大的区块将不可避免地需要更长的时间来在网络中传播,验证器也需要更长的时间进行验证,这就为出块时间设定了一个下限。为了提高时钟的速度,验证器的数量需要显著减少,而且它们都需要直接连接到同一个光纤网络。这将增加验证器串通合谋的可能性和新验证器进入的障碍,并使光纤网络的操作员成为一个中心点。

区块链共识的下一个演进将采取一个重要步骤来分离时间和状态,以相当高的成本获得巨大的吞吐量提升。

·分片+时间链=独立时钟

使用BPoS, Tendermint解除了验证器数量带来的抗审查性,这使得网络时钟从每600秒滴答一次加速到每5秒一次,从而大大提高了性能。然而,在时钟的每一次滴答之间,整个全局状态仍然被锁定,以保持全局一致的状态。

缓解这一问题的一种方法是将全局状态分割成一串更小的碎片,每个碎片都有自己独立的时钟,可以彼此独立前进。只要这些分片不需要相互交互,每个分片的性能就能保持不变,并且所有分片的总吞吐量随切分的数量线性增加。

Cosmos设想许多独立的区块链网络并行存在,能够在彼此之间传递价值,但主要是在各自的系统内进行交易。如果每个网络可以处理4000TPS,并且有13个不同的网络,那么整个系统可以达到52000 TPS,超过Spanner的吞吐量!不过,这种方法存在两个问题:

1) 权益证明区块链的安全性是通过获得33%的抵押代币和批准无效交易的成本来衡量的。如果不是单一的代币供应,而是有13个单独的网络,那么获得给定网络33%的份额的成本将大大降低。这远谈不上安全,而且严重损害了区块链的价值主张,其中安全是网络价值的一个功能。 2) 与网络内传输相比,网络间传输的TTF至少增加4倍。网络必须来回通信以同步它们的时 钟,并确保如果Alice向Bob发送代币,需要在网络上烧掉Alice的代币之前,令Bob成功在他的网络上接收到价值。

尽管Cosmos设想了一个由许多自主网络管理自身安全的世界,但以太坊2.0、Polkadot、Algorand、Dfinity和Near Protocol正在构建至少能够解决共享安全问题的系统(上述需求#1)。每个团队的方法都有细微的差别,但是基本的体系结构包括一个信标链,它既为网络的其余部分提供时钟,又安全地将验证器跨分片打乱,以便它们都共享一个公共的安全池。与Cosmos一样,增加吞吐量很容易:只需添加更多的分片!

Sharding 图说:以太坊2.0的单链和分片状态。

遗憾的是,网络间传输的高TTF问题(上述需求#2)仍然存在。即使信标链提供了一个全局时钟,每个分片也只是周期性地将其本地时钟与信标链的时钟同步。为了让Alice从分片A向分片B中的Bob发送代币,分片A中的验证器必须证明它们烧掉了Alice的代币,然后分片B中的验证器才能为Bob生成等量的代币。按照以太坊 2.0目前的设计来计,这个过程将花费6分钟,是跨分片区块时间的60倍。

虽然分片有用,但是基本的扩展限制仍然基于每个分片中的时间和状态更新是耦合的这一事实。面对Tendermint有关区块大小和区块时间的问题,每个分片仍然受到同样的限制。

分片类似于TDMA的某些元素;状态被划分成独立的分片,使用它们自己的时钟,就像手机信号塔把它们的带宽划分成独立的无线电频率和时间段一样。这样做的好处很明显,但并没有得到充分利用,这一点可以从跨分片延迟得到证明。 但是,如果可以在无许可设置中完全解耦时间和状态更新,情况会怎样呢?

·分离时间和状态

到目前为止,我们已经讨论了中本聪如何创建时间链数据结构,来为比特币网络提供一个免信任的时钟;Kwon和Buchman如何将BPoS应用于PBFT中,从而安全地减少验证器数量并加快Tendermint网络时钟;以及使用独立时钟将网络分片成许多小块将如何极大地提高吞吐量(只要将跨分片事务最小化)。然而,通过这些进展,我们已经注意到状态和时间的更新仍然是耦合的,状态更新只与时钟滴答同时发生,这为抗审查和无许可的计算网络的吞吐量和实现最终性的时间造成了根本上的限制。

分离时间和状态需要一个全局可用的时钟,它是快速、精确和信任最小化的。使用这样的时钟,状态更新可以像在Spanner中一样连续和异步地发生。只要每个人都同意使用一个全局时钟,并且事务具有时间戳,事务就可以在网络中持续流动。

通过将基于哈希的时间链与对状态更新的共识中分离,Solana已经为他们的智能合约平台构建了一个信任最小化的时钟。Solana网络中的验证器不是将哈希链接到每个块上,而是在块内连续地散列哈希本身。这种机制称为历史证明(Proof of History, PoH),它为网络中要同步的所有节点生成一个全局可用、信任最小化的细粒度时间链。 深入研究PoH的机制超出了本文的范围。有关PoH如何运作的更多细节,请参阅Solana的PoH文档

Seperating Time and State 图说:历史证明将标准化的时间戳编织到区块链中的方式。

这种独立的时间链的存在使得领导者在收到分配有时间戳的交易后,会尽快向委员会传播。时间戳提供规范的顺序,而不是由区块生成者确定的任意顺序。由于整个网络都能够就哪个事务先发生达成一致,所以现在解决双花的尝试是很容易的。

它改变了一切。

Solana中的验证器可以不断地向其对等方实时发送状态更新,而不是强制验证器每隔6-600秒达成共识,以验证时间的推移。

Solana不需要像其他所有区块链一样等待每个验证器的确认,而是可以使用一种名为Turbine的新型扇出机制(灵感来自BitTorrent)来保持通信复杂度,即复杂度为O(log(n)),而不是O(n2)。这使得Solana可以在单个全局状态下处理超过 50,000 TPS,且能实现快速最终性,而且还不需要分片。

这意味着验证器池的大小与Tendermint相当,大约100-1,000个,但是允许进行链分叉。需要一个积极的分叉管理策略,以确保无论何时出现一个链分叉,系统都能快速地收敛到单个链上,这是异步进程和恒定可用性的必要折衷。

将无线通信比喻成一个完整的轮回,PoH对于区块链的意义就像TDMA对于蜂窝网络的意义一样。把Solana的1,000个验证器想象成无线电发射塔,利用它们的同步时钟将它们的带宽细分为各个时间段。它们不断地接收新的事务,每个事务都有发送者附加的签名PoH哈希,并将它们转发给邻居,邻居可以立即使用这些PoH哈希进行排序。当领导者根据全局时钟进行轮换时,每个领导者都会选择一组有序的事务来执行,然后将“条目”传播到网络。验证器对每个“条目”投票,在他们看到⅔的验证器与自己一起投票后,确认事务终结。

这个网络作为一个整体,不断地以难以置信的高容量,按照相同的顺序处理事务,但是每个验证器都在独立地进行。相对于其他链条,这是一个微妙而深刻的变化。在Solana中,验证器永远不会停止前进。而且不管网络条件和共识如何,他们总是独立前进。

还有其他一些相关但不太重要的问题(例如快速的链增长、新的编程模型、时间链的不偏性、并行性),这些新设计超出了本文的讨论范围,不过在Solana的文档中已经讨论过了。如今,Solana的测试网在5大洲200个验证器的网络上处理超过50,000个TPS,平均TTF为1.5秒。这些参数与Spanner旗鼓相当,只不过Solana是在有目的地去中心化。 在一个信任最小化、无许可的计算机领域中,这种级别的性能只有在Solana将时间和状态分离的情况下才可能实现。Solana网络的全局可用时钟允许每个节点更新其状态,而不需要像Spanner那样与任何其他节点通信。

·重构可伸缩性

尽管加密社区已经就可伸缩性和一致性模型撰写了连篇累牍的文章,但是没有人专门研究分布式时钟的问题。几年来对权益证明的研究,带来的最佳成果是Tendermint + BpoS,无数分片方案基本上都围绕着信标链+状态分片架构展开,而允许异步状态进展的细粒度时间链将为喜欢可用性而不是一致性的非分片系统提供最佳性能。

提供一个全局可用的时钟,这使得Solana团队能善加利用40多年来的分布式系统研究。乐观并发控制(又名“乐观锁”,Optimistic Concurrency Control,简称“OCC”)等概念是1981年发明的,多年来一直应用于大型计算项目中,但在时间和状态必须同时进行时则无法应用。与GPU的并行处理从1995年开始就已经存在,不过在2007年英伟达(Nvidia)发布CUDA开发环境之前主要局限于图形卡,无法被基于区块链的系统充分利用,这类系统悲观地锁定除当前正在处理事务的账户以外的一切状态。

理解时间的推移对于分布式系统在许可和无许可设置下的性能都是至关重要的。时间就是一切,使用一种以历史证明的形式编码时间推移的新方法,无许可系统可以与经过验证的中心化云计算供应商的性能相匹配。

感谢Anatoly YakovenkoZaki Manian对本文的反馈。

Interested in partnering with us, or building something great?

Get in touch