Rakesh Kumar, Shubham Sinha, Shobhit Dixit

Authors

  • Shashi Shekhar, Dilip Kumar Sharma

Abstract

The computer systems need to be equipped to extract emotional expressions from code mixed text to better understand the human language phenomena. Text on social media contains code-mixed contents which can be used to extract equivocation information. This equivocation information is hard to be retrieved in transliterated domain. The extraction of equivocation expression used by the people to express their opinions on web is a challenging task in code mixed environment. The work presents the comparison of different approaches in code mixed social media text in transliterated domain. A rule based approach is proposed, that accepts text in code-mixed format as input and based on the defined rules, the system provides the equivocation expression in the sentence. The hypothesis is evaluated on the basis of experiments undertaken for the rule-based approach along with standard statistical approaches. On the obtained results of rule based approach and statistical approach, a voting technique is applied which selects the best equivocation tag for a word based on majority. This voting tag is also useful when all the three approaches gives different tag for a word the voting approach helps in considering the best equivocal tag. This voting approach performs best among all the different approaches used in the experiment with high accuracy.

 Keywords: NLP, transliteration,ambiguity,embedding, equivocal,mixed script.

Downloads

Published

2020-05-18

Issue

Section

Articles