[1]周乐善,冯锡炜*.基于重构双注意力网络的图文情感分析[J].计算机技术与发展,2024,34(12):157-164.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0238]
 ZHOU Le-shan,FENG Xi-wei*.Images-text Sentiment Analysis Based on Reconstructed Dual Attention Networks[J].,2024,34(12):157-164.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0238]
点击复制

基于重构双注意力网络的图文情感分析()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年12期
页码:
157-164
栏目:
人工智能
出版日期:
2024-12-10

文章信息/Info

Title:
Images-text Sentiment Analysis Based on Reconstructed Dual Attention Networks
文章编号:
1673-629X(2024)12-0157-08
作者:
周乐善冯锡炜*
辽宁石油化工大学 人工智能与软件学院,辽宁 抚顺 113000
Author(s):
ZHOU Le-shanFENG Xi-wei*
School of Artificial Intelligence and Software,Liaoning Petrochemical University,Fushun 113000,China
关键词:
深度学习多模态交互注意力BERT重构单元卷积模块卷积神经网络情感分析
Keywords:
deep learning multimodal interactive attention BERT reconstructing unit convolutional module convolutional neural networkssentiment analysis
分类号:
TP391
DOI:
10.20165/j.cnki.ISSN1673-629X.2024.0238
摘要:
在传统的图文跨模态情感分析算法中,由于缺乏对视觉特征空间和通道的关注,往往容易造成局部特征关键信息的丢失,导致在特征融合阶段,不能很好地表示关键信息。 因此,该文提出了基于重构双注意力网络的图文情感分析模型(Images-Text Sentiment Analysis Based on Reconstructed Dual Attention Networks Fusion,IRDA)。 该模型在视觉特征提取中使用 ResNet50 获取视觉特征,同时引入空间和通道重构卷积模块,对视觉特征空间和通道位置信息进行重构,对不同位置的关键信息进行融合,加强视觉特征提取。 在文本特征提取中使用 BERT 模型获取文本特征表示,并使用双向门控循环单元(Bi-GRU)关注低层次单词之间的上下文联系,进而增强文本语义特征。 使用交互注意力机制关注模态间的特征交互,并进行视觉特征与文本特征融合,进而完成情感分类任务。该模型在 MVSA 多模态数据集上进行实验验证,实验结果表明该模型皆优于当前主流模型,证实了模型的有效性。
Abstract:
In traditional multimodal sentiment analysis algorithms integrating images and text,there is often a loss of critical information in local features due to lack of attention to the visual feature space and channels. This deficiency leads to inadequate representation of key information during the feature fusion stage. To address this,we introduce a novel image-text sentiment analysis model based on the Re-constructed Dual Attention Networks Fusion ( IRDA). This model employs ResNet50 for extracting visual features and incorporates a spatial and channel reconstruction convolution module. This module reconstructs the visual feature space and channel position information,enabling the fusion of key information at different positions,thereby enhancing the extraction of visual features. In terms of textual feature extraction,the BERT model is used to obtain textual representations and employs Bi-directional Gated Recurrent Units (Bi-GRU) to focus on the contextual relationships between lower - level words, thus enhancing the semantic features of the text.Additionally,an interactive attention mechanism is used to focus on the interaction between modalities and to fuse visual and textual features,culminating in the completion of the sentiment classification task. The efficacy of this model is demonstrated through experimental validation on the MVSA multimodal dataset,with results indicating superior performance compared to current mainstream models,thereby confirming the effectiveness of the proposed model.

相似文献/References:

[1]李彩云[],张著洪[]. 求解单目标区间数规划的改进型免疫优化算法[J].计算机技术与发展,2015,25(09):102.
 LI Cai-yun[],ZHANG Zhu-hong[]. Improved Immune Optimization Algorithm Solving Single-objective Interval Number Programming[J].,2015,25(12):102.
[2]陈强锐,谢世朋.基于深度学习的肺部肿瘤检测方法[J].计算机技术与发展,2018,28(04):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
 CHEN Qiang-rui,XIE Shi-peng.Lung Cancer Detection Method Based on Deep Learning[J].,2018,28(12):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
[3]施泽浩,赵启军.基于全卷积网络的目标检测算法[J].计算机技术与发展,2018,28(05):55.[doi:10.3969/j.issn.1673-629X.2018.05.013]
 SHI Ze-hao,ZHAO Qi-jun.Object Detection Algorithm Based on Fully Convolutional Neural Network[J].,2018,28(12):55.[doi:10.3969/j.issn.1673-629X.2018.05.013]
[4]黄法秀,张世杰,吴志红,等.数据增广下的人脸识别研究[J].计算机技术与发展,2020,30(03):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
 HUANG Fa-xiu,ZHANG Shi-jie,WU Zhi-hong,et al.Research on Face Recognition Based on Data Augmentation[J].,2020,30(12):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
[5]陈浩翔,蔡建明,刘铿然,等. 手写数字深度特征学习与识别[J].计算机技术与发展,2016,26(07):19.
 CHEN Hao-xiang,CAI Jian-ming,LIU Keng-ran,et al. Deep Learning and Recognition of Handwritten Numeral Features[J].,2016,26(12):19.
[6]高翔,陈志,岳文静,等.基于视频场景深度学习的人物语义识别模型[J].计算机技术与发展,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
 GAO Xiang,CHEN Zhi,YUE Wen-jing,et al.Human Semantic Recognition Model Based on Video Scene Deep Learning[J].,2018,28(12):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
[7]贺飞翔,赵启军. 基于深度学习的头部姿态估计[J].计算机技术与发展,2016,26(11):1.
 HE Fei-xiang,ZHAO Qi-jun. Head Pose Estimation Based on Deep Learning[J].,2016,26(12):1.
[8]徐 融,邱晓晖.一种改进的 YOLO V3 目标检测方法[J].计算机技术与发展,2020,30(07):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
 XU Rong,QIU Xiao-hui.An Improved YOLO V3 Object Detection[J].,2020,30(12):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
[9]曾志平[] [],萧海东[],张新鹏[]. 基于DBN的金融时序数据建模与决策[J].计算机技术与发展,2017,27(04):1.
 ZENG Zhi-ping[] [],XIAO Hai-dong[],ZHANG Xin-peng[]. Modeling and Decision-making of Financial Time Series Data with DBN[J].,2017,27(12):1.
[10]李全兵,文 钊*,田艳梅*,等.基于 WGAN 的音频关键词识别研究[J].计算机技术与发展,2021,31(08):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
 LI Quan-bing,WEN Zhao *,TIAN Yan-mei *,et al.Research on Audio Keywords Recognition Based on WassersteinGenerative Adversarial Network[J].,2021,31(12):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
[11]王宇欣,方浩宇,张 伟,等.注意力机制在情感分析中的应用研究[J].计算机技术与发展,2022,32(04):193.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 033]
 WANG Yu-xin,FANG Hao-yu,ZHANG Wei,et al.Application Research of Attention Mechanism in Sentiment Analysis[J].,2022,32(12):193.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 033]
[12]金海燕,肖照林,蔡 磊,等.显著性目标检测理论与应用研究综述[J].计算机技术与发展,2022,32(09):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 001]
 JIN Hai-yan,XIAO Zhao-lin,CAI Lei,et al.Review on Theory and Application of Saliency Target Detection[J].,2022,32(12):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 001]
[13]张石清,张星楠,赵小明.基于音视频信息的深度多模态抑郁症识别综述[J].计算机技术与发展,2023,33(07):1.[doi:10. 3969 / j. issn. 1673-629X. 2023. 07. 001]
 ZHANG Shi-qing,ZHANG Xing-nan,ZHAO Xiao-ming.A Survey of Deep Multimodal Depression Recognition Based on Audio-visual Cues[J].,2023,33(12):1.[doi:10. 3969 / j. issn. 1673-629X. 2023. 07. 001]
[14]刘译善,孙 涵.基于特征增强的 RGB-D 显著性目标检测[J].计算机技术与发展,2023,33(11):28.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 005]
 LIU Yi-shan,SUN Han.Feature Enhancement Based RGB-D Salient Object Detection[J].,2023,33(12):28.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 005]
[15]白川平,职昕,张芳琴,等.基于自适应多尺度融合的RGB-D岩画图像分割模型[J].计算机技术与发展,2025,(05):152.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0126]
 BAI Chuan-ping,ZHI Xin,ZHANG Fang-qin,et al.RGB-D Petroglyph Image Segmentation Model Based on Adaptive Multi-scale Feature Fusion[J].,2025,(12):152.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0126]

更新日期/Last Update: 2024-12-10