[1]余佳雨,李 响,詹瑾瑜,等.基于 EDA 和回译的导游投诉文本混合增强方法[J].计算机技术与发展,2021,31(03):21-26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 03. 004]
 YU Jia-yu,LI Xiang,ZHAN Jin-yu,et al.A Hybrid Augmentation Method of Complaint Texts against Tour Guides Based on EDA and Back Translation[J].,2021,31(03):21-26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 03. 004]
点击复制

基于 EDA 和回译的导游投诉文本混合增强方法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年03期
页码:
21-26
栏目:
大数据分析与挖掘
出版日期:
2021-03-10

文章信息/Info

Title:
A Hybrid Augmentation Method of Complaint Texts against Tour Guides Based on EDA and Back Translation
文章编号:
1673-629X(2021)03-0021-06
作者:
余佳雨12李 响23詹瑾瑜12江 维1曹 扬23杨 瑞23
1. 电子科技大学 信息与软件工程学院,四川 成都 610054;
2. 中电科大数据研究院有限公司,贵州 贵阳 550022;
3. 提升政府治理能力大数据应用技术国家工程实验室,贵州 贵阳 550022
Author(s):
YU Jia-yu12LI Xiang23ZHAN Jin-yu12JIANG Wei1CAO Yang23YANG Rui23
1. School of Information and Software Engineering,University of Electronic Science and Technology of China, Chengdu 610054,China;
2. CETC Big Data Research Institute Co. ,Ltd. ,Guiyang 550022,China;
3. Big Data Application on Improving Government Governance Capabilities National Engineering Laboratory, Guiyang 550022,China
关键词:
导游违规行为识别文本增强EDA回译混合增强
Keywords:
illegal tour guide behavior detectiontext data augmentationEDAback translationhybrid augmentation
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 03. 004
摘要:
近年来,使用机器学习算法从导游投诉文本数据中识别出导游违规行为,辅助旅游监管人员工作,为旅游监管提供依据,成为一个必然趋势。 然而导游投诉文本存在着语料单一、难以获取等困难,如何对这些导游投诉文本进行文本增强以满足导游违规行为识别需要,是一个迫切需要解决的问题。 针对这一问题,提出了一种基于 EDA (easy dataaugmentation)和回译的导游投诉文本混合增强方法。 从 EDA 和回译两个角度对导游投诉文本进行增强,将两种方法返回的增强投诉语料进行混合,得到最终的增强文本;并将该方法在实际的导游违规行为识别系统中进行了应用与验证。 通过大量实验对该方法与传统的 EDA 文本增强方法、回译文本增强方法进行了分析与对比,实验数据表明,基于 EDA 和回译的导游投诉文本混合增强方法相对于其他两种传统文本增强方法具有更高的准确率和更优秀的文本增强效果,应用在实际的导游违规行为识别系统中得到了 87. 54% 的准确率,相比原始数据集准确率提升了 7. 4% 。
Abstract:
In recent years,it has become an inevitable trend to identify the illegal guide behavior from the complaint? ?texts against tour guides by machine learning,which can assist the work of tour supervisors and provide? basis for tourism supervision. However,there are some difficulties in the complaint text against tour guides, such as the lack of corpus and the difficulty in obtaining the complaint text.How to augment the complaint? texts to meet the needs of illegal tour guide behavior detection is to an urgent problem. To solve this problem, we propose a hybrid augmentation method of complaint texts against tour guides based on EDA and back translation. From two perspectives of EDA and back translation,the complaint texts against tour guides are augmented. The augmented complaint corpus is mixed to get the finial augmented texts. And the proposed method is applied in the practical tour guide behavior detection system. Extensive experiments are done to analyze and compare the proposed method with traditional EDA augmentation method and back translation augmentation method. The experiment shows that the proposed hybrid augmentation method has higher accuracy compared with the other two traditional augmentation methods. The accuracy of the proposed method in the practical tour guide behavior detection system is 87. 54% ,which is 7. 4% higher than that of? the original data set.
更新日期/Last Update: 2020-03-10