Abstract
The development of WordNets has contributed to a number of tasks in Natural Language Processing (NLP). While there is growing interest in building WordNets for popular languages, there are no major efforts for African languages which are evolving and commonly used by younger generation in social media platforms. Even where there are claims of such efforts, no publicly accessible work exist that has comprehensively addressed the challenge of creating and updating WordNets as new words are coined and meaning of words change. We present a novel technique implemented in a software tool called “Sense-Mapper” that maps Princeton WordNet 3.0 (PWN) synsets to concepts extracted from a lexical resource, detects unknown words from social media platforms, assigns senses to the unknown words and identify optimal location in the WordNet to insert the new words to cater for the evolving vocabulary. We assess the performance and effectiveness of Sense-Mapper using lexical resources and data generated from social media platforms in Kenya and show that the proposed tool achieved an accuracy of 87.34% in mapping senses between lexical resources and 88.75% in updating our WordNet. Sense-Mapper is expected to find application in a number of NLP tasks that are require assigning senses to previously unseen or rare words and updating lexical resources.
| Original language | English |
|---|---|
| Pages (from-to) | 1263-1288 |
| Number of pages | 26 |
| Journal | Journal of Information Science and Engineering |
| Volume | 41 |
| Issue number | 5 |
| DOIs | |
| Publication status | Published - 2025 |
| Externally published | Yes |
Keywords
- natural language processing
- social media platforms
- under resourced languages
- unseen words
- WordNet
ASJC Scopus subject areas
- Software
- Human-Computer Interaction
- Hardware and Architecture
- Library and Information Sciences
- Computational Theory and Mathematics