
Tok Pisin developed as the lingua franca, a common tongue everyone could use for trade and communication. Tok Pisin grew up out of a country teeming with hundreds of other languages – even today, there are still over 800 languages spoken in PNG. We recently recorded Tok Pisin for our uTalk app (coming in the next update), with our delightful voice artists Rhonda and Patrick, and I was intrigued by how the language works. Today we’re having a short introduction to Tok Pisin (literally, ‘Talk Pidgin’), one of the three official languages of Papua New Guinea (along with English and Hiri Motu). Our research explores how to understand sentiment in an intrasentential code mixing and switching context where there has been significant word localization.This work presents a 300 VADER lexicon compatible Nigerian Pidgin sentiment tokens and their scores and a 14,000 gold standard Nigerian Pidgin tweets and their sentiments labels.Halo! Nem blo mi em Nat. Gutpela lo meetim yu! By augment- ing scarce human labelled code-changed text with ample synthetic code-reformatted text and meaning, we achieve significant improvements in sentiment scoring. In practice, while many words in Nigerian Pidgin adaptation are the same as the standard English, the full English language based sentiment analysis models are not de- signed to capture the full intent of the Nigerian pid- gin when used alone or code-mixed. The implication is that the current approach of using direct English sentiment analysis of social media text from Nigeria is sub-optimal, as it will not be able to capture the semantic variation and contextual evolution in the contemporary meaning of these words. For example, ‘ginger’ is not a plant but an expression of motivation and ’tank’ is not a container but an expression of gratitude. While Pidgin preserves many of the words in the normal English language corpus, both in spelling and pronunciation, the fundamental meaning of these words have changed significantly. Abstract:Nigerian English adaptation, Pidgin, has evolved over the years through multi-language code switch- ing, code mixing and linguistic adaptation.
