The Efficiency of IsiNdebele Part of Speech Tagger: A Quantitative Analysis
DOI:
https://doi.org/10.51415/ajims.v8i1.3170Keywords:
accuracy, F1 score, part of speech tagger, precision, recallAbstract
This study evaluates the performance of the isiNdebele part of speech tagger developed by the National Centre for Human Language Technologies as part of Nguni core technologies. A sample of 522 words from government documents and isiNdebele literary works was randomly selected. A mixed-methods approach was utilised to analyse the data. The raw data were automatically processed using the tagger, and the outputs were compared against the gold standard to calculate the tagger’s accuracy. Nouns attained an accuracy of 86%, verbs 66%, adverbs 59%, pronouns 90%, adjectives 14%, conjunctions 33%, copulatives 83%, relatives 50%, possessives 90%, demonstratives 71%, while it was 0% for ideophones, interjections, prepositions, question words and auxiliary verbs. Recall and precision were calculated using Python 3.0, enabling the researchers to determine the F1 score. Nouns achieved a recall of 0.86, precision of 0.55, and F1 score 0.67, verbs 0.66, 0.7 and 0.68, relatives 0.5, 0.46 and 0.48, adverbs 0.63, 0.86 and 0.73, possessives 0.9, 0.56 and 0.69, demonstratives 0.71, 0.86 and 0.78, adjectives 0.14, 0.67 and 0.23, pronouns 0.9, 0.95 and 0.92 copulatives 0.83, 1.0 and 0.91 and conjunctions 0.36, 0.83 and 0.5 respectively. These findings underscore the importance of improving the isiNdebele part of speech tagger.