Byzantine Fault Tolerance in Distributed Systems: A Literature Review
Abstract:
Byzantine Fault Tolerance (BFT) underpins reliable consensus in arbitrary, faulty, and malicious distributed systems. The early BFT algorithms, based on the Byzantine Generals Problem, incur unnecessarily high communication overhead, limiting scalability. This review critically analyzes key milestones in BFT development, including Practical Byzantine Fault Tolerance (PBFT), where the authors introduced a primary-replica model that simplifies communication complexity. However, due to quadratic messaging, PBFT remains relatively unscalable, prompting proposals for asynchronous protocols like HoneyBadger BFT, which trade low latency for robustness. Blockchain consensus protocols, such as Nakamoto Proof-of-Work, Tendermint, and HotStuff, incorporate BFT principles to balance security, finality, and energy efficiency. Beyond blockchain, BFT is crucial in securing distributed machine learning against adversarial attacks. Challenges like scalability, energy consumption, and emerging security threats persist. Current trends focus on hybrid consensus mechanisms, sharding, and using AI to detect anomalies, aiming to enhance BFT’s effectiveness in large-scale, resource-constrained distributed systems.
KeyWords:
Byzantine, Byzantine Fault, Tolerance, Distributed, Systems, Distributed Systems
References:
- Benčić, F., Gramoli, V., & Sergot, M. (2023). The limits of BFT scalability: A quantitative analysis of modern protocols. ACM Computing Surveys, 55(4), 1–34. https://doi.org/10.1145/3533414
- Cachin, C., & Vukolić, M. (2017). Blockchain Consensus Protocols in the Wild. arXiv preprint arXiv:1707.01873. https://arxiv.org/abs/1707.01873
- Cachin, C., Guerraoui, R., & Rodrigues, L. (2011). Introduction to Reliable and Secure Distributed Programming (2nd ed.). Springer. https://doi.org/10.1007/978-3-642-19346-6
- Castro, M., & Liskov, B. (1999). Practical Byzantine Fault Tolerance. Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI ’99), 173–186. https://www.usenix.org/legacy/events/osdi99/full_papers/castro/castro.pdf
- Castro, M., & Liskov, B. (2002). Practical byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems (TOCS), 20(4), 398-461.
- Danezis, G., Gailly, N., Jakovljevic, I., & Sonnino, A. (2022). Narwhal and Tusk: Consensus from data dissemination. Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 47–63.
- Danezis, G., Li, F., & Teague, V. (2020). Narwhal and Tusk: A DAG-based Mempool and a High-Throughput Consensus. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 133–150. https://doi.org/10.1145/3372297.3417242
- DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon’s Highly Available Key-value Store. Proceedings of the Twenty-First ACM SIGOPS Symposium on Operating Systems Principles (SOSP '07), 205–220. https://doi.org/10.1145/1294261.1294281
- Dolev, D., Tzachar, N., & Zuck, L. (2022). Scaling Byzantine Fault Tolerant Consensus: Challenges and Solutions. ACM Computing Surveys, 55(6), 1–34. https://doi.org/10.1145/3491097
- Dwork, C., Lynch, N., & Stockmeyer, L. (1988). Consensus in the Presence of Partial Synchrony. Journal of the ACM, 35(2), 288–323. https://doi.org/10.1145/42282.42283
- Lamport, L., Shostak, R., & Pease, M. (1982). The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3), 382–401. https://doi.org/10.1145/357172.357176
- Li, Q., Huang, J., Ding, Y., & Gao, C. (2021). Byzantine-robust Federated Learning: A Comprehensive Survey. IEEE Transactions on Neural Networks and Learning Systems, 32(6), 2289–2308. https://doi.org/10.1109/TNNLS.2020.3029170
- Lin, Y., Zhao, T., & Wang, C. (2023). Byzantine-robust federated learning: Advances and open challenges. IEEE Transactions on Neural Networks and Learning Systems, 34(1), 12–30. https://doi.org/10.1109/TNNLS.2022.3157411
- Malkhi, D. (2019). Byzantine Fault Tolerance. In C. Cachin & M. K. Reiter (Eds.), Encyclopedia of Cryptography and Security (2nd ed., pp. 185–188). Springer. https://doi.org/10.1007/978-1-4419-5906-5_27
- Malkhi, D., & Reiter, M. K. (1998). Byzantine Quorum Systems. Distributed Computing, 11(4), 203–213. https://doi.org/10.1007/s004460050044
- Miller, A., Xia, Y., Croman, K., Shi, E., & Song, D. (2016). The Honey Badger of BFT Protocols. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 31–42. https://doi.org/10.1145/2976749.2978390
- Wan, Q., Zhang, H., & Luo, X. (2023). Lightweight BFT consensus protocols for edge computing: A review and taxonomy. IEEE Internet of Things Journal, 10(2), 1253–1272. https://doi.org/10.1109/JIOT.2022.3187890
- Wang, Z., Li, J., & Tang, J. (2023). Advances in Byzantine Fault Tolerant Protocols for Large-Scale Distributed Systems. IEEE Transactions on Network Science and Engineering, 10(2), 715–727. https://doi.org/10.1109/TNSE.2023.3248571
- Xie, C., Koyejo, O., & Gupta, I. (2020). Byzantine-robust Federated Learning through Adaptive Model Aggregation. Proceedings of the 37th International Conference on Machine Learning (ICML 2020), 11558–11567. https://proceedings.mlr.press/v119/xie20c.html
- Xie, C., Koyejo, O., & Gupta, I. (2022). Fall of empires: Breaking Byzantine-tolerant federated learning via poisoned leaders. Proceedings of the 39th International Conference on Machine Learning (ICML), 24524–24545.
- Yin, M., Malkhi, D., Reiter, M. K., Gueta, G. G., & Abraham, I. (2019). HotStuff: BFT consensus with linearity and responsiveness. Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), 347–356. https://doi.org/10.1145/3293611.3331591
- Zhou, S., Yu, Y., & Lin, C. (2020). Blockchain consensus protocols: A survey of performance and energy consumption. Journal of Parallel and Distributed Computing, 138, 1–14. https://doi.org/10.1016/j.jpdc.2019.11.010