Leveraging Semantic Diffusion for Polysemous Word Disambiguation in Morphologically Rich Low-resourced Languages

Author's Information:

Halima Aminu

Department of Computer Science, Aliko Dangote University of Science and Technology, Wudil, Nigeria

I.R. Saidu

Department of Intelligence and Cyber Security, Nigerian Defence Academy, Kaduna State, Nigeria

P. O. Odion

Department of Computer Science, Nigerian Defence Academy, Kaduna, Nigeria

Vol 02 No 10 (2025):Volume 02 Issue 10 October 2025

Page No.: 666-673

Abstract:

Word Sense Disambiguation (WSD) remains one of the most challenging problems in Natural Language Processing (NLP), particularly in morphologically rich and low-resource languages. Hausa presents a unique case, where polysemy interacts with morphology to produce highly ambiguous tokens. We introduce the Hausa Polysemy Dataset (HPD), a linguistically curated sense-annotated resource, and propose the Semantic Diffusion Model (SDM), which integrates contextualized transformer encoders with graph-based semantic diffusion to jointly leverage contextual cues, gloss knowledge, and morphological relations. On HPD, SDM achieves an F1-score of 78.5%, outperforming strong baselines including GlossBERT and non-diffusive GNNs. Detailed ablations demonstrate the importance of diffusion, class-balanced focal loss, and gloss pretraining for robust performance on rare senses.

KeyWords:

Word Sense Disambiguation, Semantic Diffusion, Hausa language, polysemy, morphologically rich languages, graph neural networks.

References:

  1. Adeyemi, K., Bello, Y., & Musa, I. (2025). Leveraging bilingual lexicons for Hausa word sense disambiguation. Proceedings of the International Conference on Language Resources and Evaluation (LREC), 1523–1532. 
  2. Aminu, H., Saidu, I. R., & Odion, P. O. (2025). Curation of a polysemous word dataset for word sense disambiguation in Hausa language. Journal of Statistical Sciences and Computational Intelligence, 1(3), 175–186. https://doi.org/10.64497/jssci.77 
  3. Amrhein, C., & Sennrich, R. (2022). Low-resource neural machine translation: A review of challenges and solutions. Transactions of the Association for Computational Linguistics, 10, 1080–1094. 
  4. Bevilacqua, M., & Navigli, R. (2020). Breaking through the 80% glass ceiling: Raising the state of the art in word sense disambiguation by incorporating knowledge graph information. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2854–2864. 
  5. Blevins, T., & Zettlemoyer, L. (2022). Zero-shot learning for word sense disambiguation. Transactions of the Association for Computational Linguistics, 10, 94–110. 
  6. Chen, R., Li, P., & Yang, T. (2025). Diffusion-enhanced transformers for semantic disambiguation. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 3655–3664. 
  7. Conia, S., Scarlini, B., & Navigli, R. (2023). Probing large language models for word sense disambiguation. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 3551–3564. 
  8. Conneau, A., et al. (2020). Unsupervised cross-lingual representation learning at scale. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 8440–8451. 
  9. Gomez, F., & Ortega, J. (2025). Hybrid graph-transformer models for polysemy disambiguation in African languages. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 3120–3129. 
  10. Huang, H., Chen, S., & Sun, M. (2023). Few-shot word sense disambiguation via prompt-based learning. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 4732–4744. 
  11. Ji, H., Pan, X., & Tang, J. (2022). Graph neural networks for semantic representation in WSD. Proceedings of the International Conference on Computational Linguistics (COLING), 1598–1607. 
  12. Klicpera, J., Bojchevski, A., & Günnemann, S. (2019, February 27). Predict then Propagate: Graph Neural Networks meet Personalized PageRank. ICLR 2019. https://arxiv.org/abs/1810.05997 
  13. Kim, S., Park, J., & Cho, K. (2024). Morphology-aware pretraining for disambiguation in agglutinative languages. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2871–2882. 
  14. Luo, F., Zhou, J., Xu, Y., & Liu, Z. (2021). Incorporating gloss information into pretrained language models for word sense disambiguation. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 911–920. 
  15. Navigli, R., Bevilacqua, M., & Conia, S. (2021). Ten years of BabelNet: A survey of large-scale multilingual semantic resources. Artificial Intelligence, 300, 103–105. 
  16. Peters, M., et al. (2020). Deep contextualized word representations revisited. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2227–2237. 
  17. Qassem, G. A. S. (2024). Difficulties of Translating Polysemous Lexical Items and The Strategies Adopted: A Case Study of EFL Learners at Saber Faculty of Science and Education - Department of English- University of Lahij. Electronic Journal of University of Aden for Humanity and Social Sciences, 5(2), 123–132. https://doi.org/10.47372/ejua-hs.2024.2.357 
  18. Vial, L., Lecouteux, B., & Schwab, D. (2022). Improving word sense disambiguation with graph neural networks. Computational Linguistics, 48(1), 77–111. 
  19. Wang, S., He, Y., & Sun, Y. (2022). Semantic diffusion models for low-resource language understanding. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 1241–1252. 
  20. Xie, J., Li, Y., & Li, S. (2024). Context-aware graph diffusion for multilingual WSD in morphologically rich languages. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 1932–1945. 
  21. Zakaria, N. H., & Yaacob, S. (2025). Morphological and Syntactic Semantics of Lexical Polysemy in the Qur’an using “Fitna” as a Case Study. Environment-Behaviour Proceedings Journal, 10(SI33), 33–38. https://doi.org/10.21834/e-bpj.v10isi33.7033 
  22. Zhang, Q., Liu, H., & Zhao, J. (2024). Enhancing low-resource WSD with cross-lingual adapters. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2120–2132.