Identification of thesaurus relationships with the support of ChatGPT e Gemini

Autori

Parole chiave:

Thesaurus Relationships, Semantic Relationships, Categorization, LLMs, ChatGPT, Gemini

Abstract

The study aims to verify if Large Language Models (LLMs) - ChatGPT and Gemini - can support experts during some critical stages of thesaurus construction: categorization and recognition of semantic relationships between terms/concepts. To this end, domain-independent prompts were defined and tested through a pilot experiment in the sub-domain of archaeological sites. The experimentation required the development of a benchmark consisting of terms and concepts linked by thesaurus’ semantic relationships. The results show that LLMs, combined with the experts’ skills, can be useful in accomplishing these tasks more efficiently. The interaction between experts and AI services has been mutually beneficial: experts helped AI services fully understand the requests providing context and improving prompts, while AI services provided reasoning for categorization and semantic network construction, which could provide experts with insights for improving tasks. The study highlights opportunities for improving AI services performances in disambiguating categories and recognizing semantic relationships.

Riferimenti bibliografici

Aggarwal, Tanay, Angelo Salatino, Francesco Osborne, and Enrico Motta. 2024. “Identifying Semantic Relationships Between Research Topics Using Large Language Models in a Zero-Shot Learning Setting.” In Sci-K 2024: 4th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment 11/12, November 2024 – Baltimore, MD, USA. CEUR Workshop.

Ahmed, Mustak, Mondrita Mukhopadhyay, and Parthasarathi Mukhopadhyay. 2023. “Automated Knowledge Organisation: AI/ML-based Subject Indexing System for Libraries.” DESIDOC Journal of Library & Information Technology 43 (1): 45-54. http://dx.doi.org/10.14429/djlit.43.01.18619.

Arachchige, Isuri Anuradha Nanomi, Le An Ha, Ruslan Mitkov, and Vinitar Nahar. 2023. “Evaluating Large Language Models in Relationship Extraction from Unstructured Data: Empirical Study from Holocaust Testimonies.” In Proceedings of Recent Advances in Natural Language Processing, Varna, Sep 4–6, 2023, 117-23.

Bender, Emily M., and Alexander Koller. 2020. “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5185-98.

Biblioteca Nazionale Centrale di Firenze (BNCF). 2025. Nuovo Soggettario – Thesaurus. https://thes.bncf.firenze.sbn.it/ricerca.php.

Bouzid, Sara, and Loïs Piron. 2024. “Leveraging Generative AI in Short Document Indexing.” Elecronics 13(17):3563. http://dx.doi.org/10.3390/electronics13173563.

Bozkurt, Aras. “Tell Me Your Prompts and I Will Make Them True: The Alchemy of Prompt Engineering and Generative AI.” Open Praxis.

Broughton, Vanda. 2023. “Facet Analysis: The Evolution of an Idea.” Cataloging & Classification Quarterly, 61(5-6), 411-38. https://doi.org/10.1080/01639374.2023.2196291.

Brown, Tom B., Benjamin Mann, Nick Ryder, et al. 2020. “Language Models are Few-Shot Learners.” Advances in Neural Information Processing Systems (NeurIPS) 33.

Cardillo, Elena, Alessio Portaro, Maria Taverniti, Claudia Lanza, and Raffaele Guarasci. “Towards the automated population of Thesauri using BERT: a use case on the Cybersecurity domain.” In Advances in Internet, Data & Web Technologies. EIDWT 2024. Lecture Notes on Data Engineering and Communications Technologies 193, edited by L. Barolli. Springer, Cham. https://doi.org/10.1007/978-3-031-53555-0_10.

Giunchiglia, Fausto, and Mayukh Bagchi. 2024. “From Knowledge Representation to Knowledge Organization and Back.” In International Conference on Information (iConference) 2024, Wisdom, Well-being, Win-win. Springer LNCS, Springer Cham Switzerland. https://arxiv.org/abs/2312.07302.

Golub, Koraljka, Jue Wang, and Johannes Widegren. 2024. “Using ChatGPT for (semi-) automatic subject indexing of different document types.” In Digital Humanities in the Nordic and Baltic Countries, 8th Conference Reykjavík 27-31 May 2024.

Hazem, Amir, Beatrice Daille, and Claudia Lanza. 2020. “Towards Automatic Thesaurus Construction and Enrichment.” In Proceedings of the 6th International Workshop on Computational Terminology, May 2020, Marseille, France. 62-71.

ISO 25964-1:2011. Information and documentation – Thesauri and interoperability with other vocabularies. Part 1: Thesauri for information retrieval.

Li Xiaonan, and Xipeng Qiu. 2023. “Finding Supporting Examples for In-Context Learning.” https://doi.org/10.48550/arXiv.2302.13539.

Liao, Q. Vera, and Jennifer Wortman Vaughan. 2023. “AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap.” https://doi.org/10.48550/arXiv.2306.01941.

Liu, Pengfei, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.” ACM Comput. Surv. 55(9): 195:1-195:35.

Mancinelli, Maria Letizia. 2018. “Gli standard catalografici dell’Istituto Centrale per il Catalogo e la Documentazione.” In Le voci, le opere e le cose. La catalogazione dei beni culturali demoetnoantropologici, a cura di Roberta Tucci. 279-302. Roma: Istituto centrale per il catalogo e la documentazione – Ministero dei beni e delle attività culturali e del turismo.

Marvin, Ggaliwango, Nakayiza Hellen, Daudi Jjingo, and Joyce Nakatumba-Nabende. 2024. “Prompt Engineering in Large Language Models.” In Proceedings of Data Intelligence and Cognitive Informatics – ICDICI 2023. Algorithms for Intelligent Systems. Singapore: Springer.

Meng, Fan, Kaile Zhou, Yi Bu, et al. 2022. “Keywords Extraction and Thesaurus Construction for Domain News.” Procedia Computer Science 214: 837-44. https://doi.org/10.1016/j.procs.2022.11.249.

Ministero per i Beni e le Attività Culturali e per il Turismo. Istituto Centrale per il Catalogo e la Documentazione. 2009. Strumenti terminologici. Vocabolario aperto per la definizione dei siti archeologici (applicazione nella scheda SI – Siti archeologici, versione 3.00). Ultimo aggiornamento 2023.

Mitchell, Melanie, and David C. Krakauer. 2024. “The debate over understanding in AI’s large language models.” Proc. Natl. Acad. Sci. U.S.A. 120 (13), https://doi.org/10.1073/pnas.221590712.

Naveed, Humza, Asad Ullah Khan, Shi Qiu, et al. 2025. “A Comprehensive Overview of Large Language Models.” ACM ransactions on Intelligent Systems and Technology. https://doi.org/10.1145/3744746.

Noruzi, Alireza. 2024. “The use of artificial intelligence in knowledge organization and subject indexing.” Informology, 3(1): 1-8.

Omame, Isaiah, and Juliet Chinedu Alex-Nmecha. 2020. “Artificial Intelligence in Libraries.” In Managing and adapting Library Information Services for future users. IGI Global. https://doi.org/10.4018/978-1-7998-1116-9.ch008.OpenAI, 2023. GPT-4 Technical Report. CoRR abs/2303.08774.

Qin, Jian. 2020.“Knowledge Organization and Representation under the AI Lens.” Journal of Data and Information Science 5(1):3-17. https://doi.org/10.2478/jdis-2020-0002.

Raiaan, Mohaimenul Azam Khan, Saddam Hossain Mukta, Kaniz Fatema, Nur Mohammad Fahad, Sadman Sakib, and Most Marufatul Jannat Mim. 2024. “A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges.” IEEE Access 12: 26839-874. https://doi.org/10.1109/ACCESS.2024.3365742.

Rezaei, Mojtaba. 2025. “Artificial intelligence in knowledge management: Identifying and addressing the key implementation challenges.” Technological Forecasting & Social Change 217. https://doi.org/10.1016/j.techfore.2025.124183.

Sahin Özdemir, Meryem, and Yusuf Emre Ozdemir. 2024. “Comparison of the Performances between ChatGPT and Gemini in Answering Questions on Viral Hepatitis.” Scientific Reports 15 (1). https://doi.org/10.1038/s41598-024-83575-1.

Santosa, Faizhal Arif. 2025. “Artificial Intelligence in Library Studies: A Textual Analysis.” JLIS.It 16 (1):61-71. https://doi.org/10.36253/jlis.it-626.

Shen, Yiqiu, Laura Heacock, Jonathan Elias, et al. 2023. “ChatGPT and Other Large Language Models Are Double-edged Swords.” Radiology 307(2). https://doi.org/10.1148/radiol.230163.

Shi, Freda, Xinyun Chen, Kanishka Misra, et al. 2023. “Large Language Models Can Be Easily Distracted by Irrelevant Context.” In Proceedings of the International Conference on Machine Learning (ICML 2023). 31210-27. https://doi.org/10.48550/arXiv.2302.00093.

Su, Hongjin, Jungo Kasai, Chen Henry Wu, et al. 2023. “Selective Annotation Makes Language Models Better Few-Shot Learners.” In Proceedings of the International Conference on Learning Representations (ICLR 2023). https://doi.org/10.48550/arXiv.2209.01975.

Suominen, Osma, Juho Inkinen, and Mona Lehtinen. 2022. “Annif and Finto AI: Developing and Implementing Automated Subject Indexing.” JLIS.It 13 (1): 265-82. https://doi.org/10.4403/jlis.it-12740.

Tematres. n.d. Last accessed July 14, 2025. https://vocabularyserver.com/web/.

Touvron, Hugo, Thibaut Lavril, Gautier Izacard, et al. 2023. “LLaMA: Open and Efficient Foundation Language Models.” https://doi.org/10.48550/arXiv.2302.13971.

Tudhope, Douglas, Harith Alani, and Christopher Jones. 2001. “Augmenting thesaurus relationships: possibilities for retrieval.” Journal of Digital Information 1 (8).

White, Jules, Quchen Fu, Sam Hays, et al. 2023. “A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT.” https://doi.org/10.48550/arXiv.2302.11382.

Yang, Dongqiang, and David M. V. Powers. 2010. “Using Grammatical Relations to Automate Thesaurus Construction.” Journal of Research and Practice in IT 42(2): 129-46. https://search.informit.org/doi/10.3316/informit.448235738201280.

Zhao, Wayne Xin, Kun Zhou, Junyi Li, et al. 2023. “A Survey of Large Language Models.”https://doi.org/10.48550/arXiv.2303.18223.

##submission.downloads##

Pubblicato

30-12-2025

Fascicolo

Sezione

Contributi

Articoli simili

1 2 3 4 5 > >> 

Puoi anche Iniziare una ricerca avanzata di similarità per questo articolo.

Puoi leggere altri articoli dello stesso autore/i