«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.cnki.ISSN1673-629X.2024.0300]
点击复制

基于位置和词性特征的藏文情感三元组抽取模型()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:: 2025年02期

页码:: 130-137

栏目:: 人工智能

出版日期:: 2025-02-10

文章信息/Info

Title:: Tibetan Sentiment Triplet Extraction Model Based on Location and Part-of-Speech Features

文章编号:: 1673-629X(2025)02-0130-08

作者:: 斯曲卓嘎1; 2; 3; 拥措1; 2; 3*; 赛鸣宇1; 2; 3; 1. 西藏大学信息科学技术学院,西藏拉萨 850000;
2. 西藏自治区藏文信息技术人工智能重点实验室,西藏拉萨 850000;
3. 藏文信息技术教育部工程研究中心,西藏拉萨 850000

Author(s):: SI Qu-zhuo-ga1; 2; 3; YONG Tso1; 2; 3*; SAI Ming-yu1; 2; 3; 1. School of Information Science and Technology,Tibet University,Lhasa 850000,China;
2. Key Laboratory of Tibetan Information Technology and Artificial Intelligence in Tibet Autonomous Region,Lhasa 850000,China;
3. Engineering Research Center of Tibetan Information Technology,Ministry of Education,Lhasa 850000,China

关键词:: 藏文; Word2Vec; 词性; 位置特征; 情感三元组

Keywords:: Tibetan; Word2Vec; part of speech; positional features; emotional triplet

分类号:: TP391

DOI:: 10.20165/j.cnki.ISSN1673-629X.2024.0300

摘要:: 藏文情感三元组(方面词、情感词、情感极性)是细粒度情感分析的核心任务,对于深入理解藏族情感表达和趋势至关重要。但藏文的独特语言结构和文化背景导致其情感表达方式与其他语言不同,从而增加了细粒度情感分析的复杂性。为了提高藏文情感三元组的提取能力,该文提出了 OpinionNet-OTE-MTL 模型,该模型融合了词性信息、Word2Vec 词向量和绝对位置向量,并通过双向长短时记忆网络(BILSTM)进行特征提取。其中,由于藏文词性种类较多,该文分析了大量的情感数据集并从中提取出 11 种词性辅助模型识别。最后,为了验证 OpinionNet-OTE-MTL 模型的有效性,在自构建的 2 000 句藏文细粒度情感分析数据上进行了对比实验和消融实验。消融实验表明,词性较位置信息对模型的影响更大,其三元组抽取 F1 值提高了 3. 06 百分点;对比实验结果表明将词性和位置特征融入进模型后,在情感三元组提取(Triple)任务上的精确率、召回率和 F1 值较基线实验提高了 4. 73 百分点、6 百分点、6. 14 百分点,融入词性和绝对位置信息使模型能更精确地理解藏文的语法结构和语义规则,从而提升了情感三元组分类任务的准确度。

Abstract:: The extraction of Tibetan emotional triplets (aspect words,emotional words,emotional polarity) is a core task in fine-grained sentiment analysis,which is crucial for a deep understanding of Tibetan emotional expressions and trends. However,the unique language structure and cultural background of Tibetan make its emotional expression different from other languages, thereby increasing the complexity of fine - grained sentiment analysis. To enhance the capability of extracting Tibetan emotional triplets, we propose the OpinionNet-OTE-MTL model,which integrates part-of-speech information,Word2Vec word vectors,and absolute position vectors. Feature extraction is performed through Bidirectional Long Short-Term Memory networks (BiLSTM). Given the diverse types of parts of speech in Tibetan, we analyze a large amount of emotional datasets and extract 11 types of parts of speech to assist in model recognition. Finally, to validate the effectiveness of the OpinionNet - OTE - MTL model, comparative experiments and ablation experiments were conducted on a self-constructed dataset of 2 000 Tibetan sentences for fine-grained sentiment analysis. The ablation experiments indicated that part-of-speech information had a greater influence on the model than positional information,resulting in a 3.06 percentage points increase in the F1 score of triplet extraction. Comparative experimental results showed that after integrating part-of-speech and positional features into the model,the precision,recall,and F1 score of the emotional triplet extraction task increased by 4.73 percentage points,6 percentage points,and 6. 14 percentage points respectively compared to the baseline experiment. Integrating part-of-speech and absolute positional information enables the model to better understand the grammatical structure and semantic rules of Tibetan,thereby improving the accuracy of emotional triplet classification tasks.

相似文献/References:

[1]黄鹤鸣赵晨星.引入排序码实现藏文字符的排序[J].计算机技术与发展,2008,(10):68.
　HUANG He-ming,ZHAO Chen-xing.Introducing Sort Code to Realize Tibetan Characters＇ Sort[J].,2008,(02):68.
[2]卫华,韩立新,夏建华. 基于Word2 fea模型的文本建模方法[J].计算机技术与发展,2016,26(02):165.
　WEI Hua,HAN Li-xin,XIA Jian-hua. Text Modeling Method Based on Word2 fea Model[J].,2016,26(02):165.
[3]张兴兰,刘炀. 基于复杂网络及神经网络挖掘用户兴趣的方法[J].计算机技术与发展,2016,26(12):22.
　ZHANG Xing-lan,LIU Yang. Method of Mining User Interest Based on Complex Network and Neural Network[J].,2016,26(02):22.
[4]刘芳,张云洋.基于像素邻域点信息的藏文图像细化算法研究[J].计算机技术与发展,2018,28(04):21.[doi:10.3969/ j. issn.1673-629X.2018.04.005]
　LIU Fang,ZHANG Yun-yang.Research on a Tibetan Image Refinement Algorithm Based on Adjacent Pixel Points， Information[J].,2018,28(02):21.[doi:10.3969/ j. issn.1673-629X.2018.04.005]
[5]倪高伟,李涛,刘峥.结合语义和结构的短文本相似度计算[J].计算机技术与发展,2018,28(08):104.[doi:10.3969/ j. issn.1673-629X.2018.08.022]
　NI Gao-wei,LI Tao,LIU Zheng.Similarity Calculation of Short Text Combined with Semantic and Structure[J].,2018,28(02):104.[doi:10.3969/ j. issn.1673-629X.2018.08.022]
[6]贾清,杨抒.基于 Word2vec 的克隆代码检测方法研究[J].计算机技术与发展,2020,30(08):124.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 021]
　JIA Qing,YANG Shu.Research on Clone Code Detection Method Based on Word2vec[J].,2020,30(02):124.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 021]
[7]李鑫.一种面向 Mashup 应用的 API 推荐方法[J].计算机技术与发展,2021,31(02):38.[doi:10. 3969 / j. issn. 1673-629X. 2021. 02. 007]
　LI Xin.An API Recommendation Method for Mashup Application[J].,2021,31(02):38.[doi:10. 3969 / j. issn. 1673-629X. 2021. 02. 007]
[8]何烨辛,谷　林,孙　晨.基于ＣＮＮ的程序编译错误信息特征提取[J].计算机技术与发展,2021,31(05):204.[doi:10. 3969 / j. issn. 1673-629X. 2021. 05. 035]
　,ＣＮＮ－ｂａｓｅｄＰｒｏｇｒａｍＣｏｍｐｉｌａｔｉｏｎＥｒｒｏｒＭｅｓｓａｇｅＦｅａｔｕｒｅＥｘｔｒａｃｔｉｏ[J].,2021,31(02):204.[doi:10. 3969 / j. issn. 1673-629X. 2021. 05. 035]
[9]冼广铭,王鲁栋,曾碧卿,等.基于 LDA 和 BiGRU 的文本分类[J].计算机技术与发展,2022,32(04):15.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 003]
　XIAN Guang-ming,WANG Lu-dong,ZENG Bi-qing,et al.Text Classification Based on LDA and BiGRU[J].,2022,32(02):15.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 003]
[10]王小楠,黄卫东.基于类别主题词集的加权相似度短文本分类[J].计算机技术与发展,2022,32(09):95.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 015]
　WANG Xiao-nan,HUANG Wei-dong.Short Text Classification with Weighted Similarity Based on Category Topic Word Set[J].,2022,32(02):95.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 015]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed280
全文下载/Downloads120
评论/Comments