«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.cnki.ISSN1673-629X.2024.0203]
点击复制

CMNER:基于微博的中文多模态实体识别数据集()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 34
期数:: 2024年10期

页码:: 110-117

栏目:: 人工智能

出版日期:: 2024-10-10

文章信息/Info

Title:: CMNER:A Chinese Multimodal NER Dataset Based on Weibo

文章编号:: 1673-629X(2024)10-0110-08

作者:: 季源泽; 李霏; 武汉大学国家网络安全学院空天信息安全与可信计算教育部重点实验室,湖北武汉 430072

Author(s):: JI Yuan-ze; LI Fei; Key Laboratory of Aerospace Information Security and Trusted Computing,Ministry of Education,School of Cyber Science and Engineering,Wuhan University,Wuhan 430072,China

关键词:: 多模态命名实体识别; 图像; 命名实体; 中文; 跨语言

Keywords:: multimodal named entity recognition; image; named entity; Chinese; cross-lingual

分类号:: TP391.1

DOI:: 10.20165/j.cnki.ISSN1673-629X.2024.0203

摘要:: 多模态命名实体识别(MNER)旨在通过相关图像的辅助从文本中定位并分类命名实体。目前,中文多模态命名实体识别研究缺乏相关的人工标注数据,限制了中文多模态命名实体识别的发展。该文旨在构建一个基于社交媒体平台的中文 MNER 数据集,收集了5 000 条微博帖子和18 326 张相应的图像,并人工标注了其中的人名、地名、组织机构名和其他类实体。该文在此数据集上应用了 ACN 模型和 UMT 模型进行基线实验。实验结果表明,两个模型的 F1 值分别达到了74. 22% 和 89. 50% ,证明了数据集的有效性和可用性。此外,该文还进行了跨语言迁移学习实验,证明了中文和英文 MNER 数据能够相互补充,增强实体识别模型的性能。为了促进中文多模态命名实体识别的相关研究,该文公开了 CMNER 数据集和相关代码。

Abstract:: Multimodal Named Entity Recognition (MNER) is a pivotal task designed to extract and classify named entities from text with the assistance of pertinent images. Nonetheless,a notable paucity of manual annotation data for Chinese MNER has considerably impeded the progress of Chinese multimodal named entity recognition. We compile a Chinese Multimodal NER dataset (CMNER) utilizing data sourced from social media platform,encompassing 5 000 Weibo posts paired with 18 326 corresponding images. The entities are classified into four distinct categories:person,location,organization,and miscellaneous. We applied the ACN model and UMT model as baseline experiments on CMNER. The experimental results indicate that the F1 scores of the two models reach 74. 22% and 89. 50% ,respectively,validating the effectiveness of the dataset. Furthermore,we conducted cross-lingual experiments and the results substantiate that Chinese and English multimodal NER data can mutually enhance the performance of the NER model. To promote related research on Chinese MNER,the CMNER and related code are released.

相似文献/References:

[1]吴长勤段汉根.基于灰色预测的残缺图像的修复算法[J].计算机技术与发展,2010,(05):124.
　WU Chang-qin,DUAN Han-gen.An Algorithm for Image Reparation Based on Grey Prediction[J].,2010,(10):124.
[2]王兴武章权兵徐颜.基于SOA机场防入侵系统的研究[J].计算机技术与发展,2009,(10):152.
　WANG Xing-wu,ZHANG Ouan-bing,XU Yah.Research of Airport Anti - Intrusion System Based on SOA Architecture[J].,2009,(10):152.
[3]陈帅钟先信朱士永石军锋.基于线性同余的伪随机序列图像加密[J].计算机技术与发展,2006,(04):17.
　CHEN Shuai,ZHONG Xian-xin,ZHU Shi-yong,et al.Image Encryption Through Pseudo- Random Sequence Based on Linear Congruence[J].,2006,(10):17.
[4]崔春惠张桂玲张大坤.图像容错技术研究[J].计算机技术与发展,2011,(03):15.
　CUI Chun-hui,ZHANG Gui-ling,ZHANG Da-kun.A Survey on Image Fault Tolerance Technology[J].,2011,(10):15.
[5]袁玲杜启亮.细胞位姿视觉识别的研究[J].计算机技术与发展,2011,(12):89.
　YUAN Ling,DU Qi-Hang.Research on Cell＇s Position-Orientation Identification[J].,2011,(10):89.
[6]侯艳丽.融合多特征的纹理图像分割算法[J].计算机技术与发展,2012,(05):120.
　HOU Yan-li.Texture Image Segmentation Algorithm of Space Feature and Frequency Feature Fusion[J].,2012,(10):120.
[7]俞文静,张明军,王影.面向视频超分辨率重建的混合粒子群优化算法[J].计算机技术与发展,2018,28(11):75.[doi:10.3969/ j. issn.1673-629X.2018.11.017]
　YU Wen-jing,ZHANG Ming-jun,WANG Ying.A Hybrid Particle Swarm Optimization Algorithm for Image/ Video Super-resolution Reconstructio[J].,2018,28(10):75.[doi:10.3969/ j. issn.1673-629X.2018.11.017]
[8]陆兴华,王凌丰,曾世豪,等.基于神经网络学习的多姿态人脸图像识别算法[J].计算机技术与发展,2019,29(11):57.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 012]
　LU Xing-hua,WANG Ling-feng,ZENG Shi-hao,et al.Multi-pose Face Image Recognition Algorithm Based on Neural Network Learning[J].,2019,29(10):57.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 012]
[9]陆兴华,蔡韬.基于 CNN 的安防监控步态特征提取研究[J].计算机技术与发展,2019,29(11):123.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 025]
　LU Xing-hua,CAI Tao.Research on Gait Feature Extraction in Security Monitoring System Based on CNN[J].,2019,29(10):123.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 025]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed340
全文下载/Downloads229
评论/Comments