Once again, the problem was with the code that was used to parse Bitcoin transactions that were above the consensus layer. Before Taproot, there were size constraints on the amount of script and witness data that could be included in a transaction. I covered this topic in my article on the earlier bug that Burak had triggered. As a result of the activation of Taproot, such limitations have been eliminated, leaving the limitations on the block size limit itself as the only ones that can restrict certain aspects of individual transactions. The previous flaw was due to the fact that, despite the fact that the consensus code in btcd had been appropriately upgraded to reflect this change, the code that handled peer-to-peer transmission, which included parsing data before sending or when receiving, had not been upgraded in an appropriate manner. This was the source of the problem. The data was rejected by the code that was processing blocks and transactions before it was ever sent off to be validated for consensus. As a result, the code never delivered the rejected data to the logic that validates consensus, and the block in question was never successfully validated.
This time around, a very comparable event took place. Another restriction that was wrongly enforced in the peer-to-peer area of the codebase was a restriction on the witness data. This restriction limited the witness data to a maximum of 1/8 of the block size, rather than the full block size. Once again, Burak was able to stall btcd and LND nodes at that block height by generating a transaction that had witness data that was just a single weight unit over the stringent limit. This transaction was a non-standard transaction, which means that even though it is perfectly valid according to the rules of consensus, it is not valid according to the default mempool policy, and as a result, nodes will not relay it across the network. This is the case even though the transaction is perfectly valid according to the rules of consensus. It is not impossible to have it mined into a block, but the only method to do so is to deliver it directly to a miner. Burak did this with the assistance of F2Pool, and it was successful.
This really drives home the point that any piece of code whose aim is to read and validate Bitcoin data needs to be carefully audited in order to ensure that it is in line with what Bitcoin Core will do. This can be done by comparing it to previous versions of Bitcoin Core. It makes no difference if the code in question is the consensus engine for a node implementation or simply a piece of code responsible for sending transactions around for a Lightning node; both types of code are considered equivalent. This second bug was practically directly above the one that was found the previous month in the source. There was not even a single person at Lightning Labs who came across it. On October 11, AJ Towns made the discovery and reported it, which was two days after Burak's 998-of-999 multisig transaction had initially triggered the problem. It was available to the public on Github for a period of ten hours until it was removed. After that, a repair was created, but it was decided against releasing it because the objective was to covertly patch the vulnerability in the subsequent release of LND.
Now, this is a pretty usual method for a significant vulnerability, especially with a project like Bitcoin Core where such a weakness can actually cause serious damage to the base-layer network or protocol. In other words, this is a pretty standard procedure for a serious problem. However, in this particular instance, it posed a significant danger to the funds held by LND users. Furthermore, considering that it was literally right next to an earlier bug that posed the same dangers, the likelihood that it would be discovered and exploited was quite high, as Burak's demonstration showed. When it comes to vulnerabilities like this one, which might make users susceptible to the theft of funds, this begs the question of whether or not the quiet-patch strategy is the best course of action to take (because their node is left unable to detect old channel states and properly penalize them).
As I discussed in my article on the previous bug, if a malicious actor had discovered the bugs before a developer with good intentions, they could have strategically opened new channels to vulnerable nodes, routed the entire contents of those channels back to themselves, and then exploited the bug. I went into detail about this possibility in my article on the previous bug. From that point on, they would have control over those monies and also have the ability to close the channel with the initial state, which would practically double the amount of money that they had. What Burak accomplished, which involved intentionally abusing this vulnerability in an ironic manner, ended up protecting LND users from an attack of this kind.
Once it was exploited, users were vulnerable to assaults of this kind from preexisting peers with whom they already had open channels, but they were no longer capable of being explicitly targeted with new channels after the vulnerability was discovered. Their nodes had become stuck, and as a result, they were unable to recognize or process payments made through channels that other people attempted to initiate following the block that had stuck their nodes. Because of this, the possibility of users being exploited was reduced to the extent that it only concerned those individuals with whom they already communicated via a channel. The response of Burak helped to reduce the impact. Because of this type of action taken in reaction to the bug, the damage that may have been caused was mitigated, consumers were made aware of the risk they were taking, and the bug was corrected more rapidly.
Additionally, LND was not the only thing that was impacted. The pegging process for Liquid was also flawed, and in order to restore it, updates had to be applied to the functionaries of the federation. Additionally, older versions of the Rust Bitcoin client were impacted, which resulted in the stall affecting certain instances of block explorers and electrs (an implementation of the backend server for Electrum Wallet). Now, with the exception of Liquid's peg eventually exposing funds to the emergency recovery keys held by Blockstream after a timelock expiry, these other issues never put anyone's funds at risk at any point. This is because Liquid's peg eventually exposed funds to the emergency recovery keys held by Blockstream after a timelock expiry. However, in the realistic heist-style movie plot where Blockstream stole these funds, everyone knows exactly who to go after. In addition, this particular fault had been fixed in newer versions of Rust Bitcoin, but the developers of that codebase don't appear to have spoken with the people responsible for other codebases to warn them about the possibility of problems of this nature. The fact that the problem was present in a number of different codebases was not brought to widespread attention until the bug itself was actively exploited on the network.
When it comes to vulnerabilities like this one in the Layer 2 software that Bitcoin uses, this raises a number of significant concerns. First, the level of seriousness with which these codebases are inspected for flaws in security and the manner in which this is prioritized in comparison to the addition of new features. The fact that this second bug was not even found by the maintainers of the codebase where it was present, despite the fact that it was literally right next to the first bug that was discovered a month ago, is very telling about how security is not always prioritized. I find this to be a very telling example of how security is not always prioritized. Was there not an internal audit of the codebase done when there was one serious flaw that put the funds of customers at risk? It took someone who was not involved in the initiative to find out about it? This does not reflect a priority to protect the funds of consumers above the development of new features that will attract more users. Second, the fact that this problem has already been fixed in Rust Bitcoin illustrates that there is a lack of communication on bugs of this nature between the people who manage the various codebases. Someone who discovered a bug in one of the codebases probably wouldn't immediately think, "I should contact other teams writing similar software in totally different programming languages to warn them about the potential for such a bug," due to the fact that the codebases are completely different. This is a situation that is fairly easy to comprehend. You don't discover a flaw in Windows and then immediately consider going to report the flaw to the people who maintain the Linux kernel. However, Bitcoin as a mechanism for reaching distributed consensus across a worldwide network is a very different animal; perhaps Bitcoin developers should begin to consider along those lines when it comes to the vulnerabilities that exist in Bitcoin software. In particular, when it comes to the process of digesting and interpreting data that is relevant to consensus.
Last but not least, when it comes to protocols such as Lightning, which must constantly monitor the blockchain in order to be able to respond to previously used channel states in order to keep the system secure, independent data parsing and verification ought to be kept to an absolute minimum, if not entirely eliminated and delegated to Bitcoin Core or data that is directly derived from it. This is because the system's ability to keep the system secure depends on it. This is the architecture of Core Lightning, which establishes a connection to an instance of Bitcoin Core and places its whole reliance on that instance for the validation of blocks and transactions. If LND operated in the same manner as btcd, neither of these issues in btcd would have been able to harm users of LND in a way that placed their cash at jeopardy.
This incident demonstrates that there is a need for a change in how the problem of how Layer 2 software handles interacting with consensus-related data is approached. Regardless of how things are handled — whether they are completely outsourced or simply minimized and approached with much more care — this incident demonstrates that something needs to change. Once again, everyone is extremely fortunate that this was not abused by a malicious actor but rather by a developer illustrating a point. If it had been, everyone would have been in a lot of trouble. Having said that, Bitcoin cannot rely on getting fortunate or hope that there are no malevolent actors in the world.
Instead of playing a game in which blame is passed around like a hot potato, users and developers should put their attention on enhancing the procedures that are in place to ensure that accidents similar to this one do not occur again.