Nepali Sentiment Analysis

Monday. July 01, 2019 - 1 min

This project is my second project in Nepali language. I tried to combine Named Entity Recognition and Sentiment Analysis tasks together in this project.

Abstract

With the increase in internet access and the ease of writing comments in the Nepali language, fine-grained sentiment analysis of social media comments is becoming more and more pertinent. There are a number of benchmarked datasets for high-resource languages (English, French, and German) in specific domains like restaurants, hotels or electronic goods but not in low-resource languages like Nepali. In this paper, we present our work to create a dataset for the targeted aspect-based sentiment analysis in the social media domain, set up a dataset benchmark and evaluate using various machine learning models. The dataset comprises of code-mixed and code-switched comments extracted from Nepali YouTube videos. We present convincing baselines using a multilingual BERT model for the Aspect Term Extraction task and BiLSTM model for the Sentiment Classification Task achieving 57.978% and 81.60% F1 score respectively.

Tasks

We divided the experiments into two sub-tasks: Aspect Term Extraction and Sentiment Polarity Identification.

Aspect Term Extraction resembles the sequence labeling task where we tag each token of a given sentence with predefined aspect category or named entities. We experiment with four major categories General, Profanity, Violence, Feedback under Aspect Category and Person, Organization, Location and Miscellaneous under Target Entities.

Sentiment Polarity Identification is a binary classification task to identify sentiment polarity [0, 1] of each aspect categories in every given sentence.

References

Internet Stat
They Don’t Leave Us Alone Anywhere We Go
SemEval 2014, 2015 and 2016 Aspect Based Sentiment Analysis
Annotating Targets of Opinions in Arabic using Crowdsourcing

Github

Publication

This research will be published in IEEE ASONAM 2020