基于融合文本情感转换的在线评论过采样方法-《计算机技术与发展》

文章信息/Info

Title:: A Oversampling Method for Online Reviews Based on Fusion Text Sentiment Transfer

Author(s):: ZHAO Chang-xin; CHANG Dao-fang; School of Logistics Engineering,Shanghai Maritime University,Shanghai 201306,China

Keywords:: text sentiment transfer; unbalanced online reviews; feature dictionary; masked self-attention; Seq2Seq

摘要:: 近年来,在线评论区已经被“好评返现”“刷评”等控评手段的滥用所破坏,对在线评论情感分析模型在真实应用场景中的性能造成了严重影响。对此,提出了一种基于融合文本情感转换的在线评论过采样方法以缓解上述样本数量分布失衡引起的问题。该方法融合了基于特征词典的方法与基于深度学习的方法实现对文本的情感转换。对于多数类样本中的显式情感表达,该方法采用基于特征词典的方法识别并完成替换。同时,基于深度学习的方法搭建了 Seq2Seq 模型并引入了掩码自注意力机制,用于替换文本中的隐式情感表达。最后采用限制性 EDA 方法对结果进一步扩充,作为少数类的过采样样本。通过在采集的真实在线评论数据集上进行实验,结果表明该方法使训练出的模型获得了 16. 6% 的精确率和 9. 5% 的 F1 值提高,同时对少数样本的分辨能力提高了 12. 2% 。其相较传统方法对所训练的模型同样有更好的性能提升。

Abstract:: In recent years,the online review section has been destroyed by the abuse of " praise cashback" and " review review" ,which has seriously affected the performance of online review sentiment analysis model in real application scenarios. Therefore,we propose an online review oversampling method based on fusion text sentiment transfer. This method combines feature dictionary-based and deep learning-based methods to achieve text sentiment transfer. Feature dictionary-based method was used to identify and replace explicit sen-timent expression in most class samples. At the same time,the deep learning-based approach replaces the implicit sentiment expression that the former cannot hit by adding mask self-attention mechanism to the Seq2Seq model. Then,the restrictive EDA method is used to further expand the text as an enhanced text for a few class samples. The experimental results on the real data set show that the accuracy and F1 value of the proposed method are improved by 16. 6% and 9. 5% respectively,and the model’s resolution to minority samples is improved by 12. 2% . Compared with the traditional method,it also has better performance improvement for the trained model.