«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.cnki.ISSN1673-629X.2024.0212]
点击复制

基于中文的亚洲英语共同体语料库的构建()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 34
期数:: 2024年11期

页码:: 180-185

栏目:: 新型计算应用系统

出版日期:: 2024-11-10

文章信息/Info

Title:: Construction of a Chinese-based Corpus of Asian English Community

文章编号:: 1673-629X(2024)11-0180-06

作者:: 叶星妤1; 潘孝新1; 秦晓惠2; 王龙1*; 黄超1; 罗熊1; 1. 北京科技大学计算机与通信工程学院,北京 100083;2. 北京科技大学外国语学院,北京 100083

Author(s):: YE Xing-yu1; PAN Xiao-xin1; QIN Xiao-hui2; WANG Long1*; HUANG Chao1; LUO Xiong1; 1. School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China;2. School of Foreign Languages,University of Science and Technology Beijing,Beijing 100083,China

关键词:: 语料库; 亚洲英语; 大数据; 语言检索; 自然语言处理

Keywords:: corpus; Asian English; big data; language retrieval; natural language processing

分类号:: TP391

DOI:: 10.20165/j.cnki.ISSN1673-629X.2024.0212

摘要:: 基于中文的亚洲英语共同体是中华文化的承载介体,是人类命运共同体的基本通用语之一。然而,缺乏大量的真实可信数据、科学的数据挖掘与自然语言处理方法,已成为制约基于中文的亚洲英语研究发展的关键技术问题。在分析相关研究现状的基础上,设计并实现了一个大数据驱动的基于中文的亚洲英语语料库并通过 Web 开发实现在线检索服务(Corpus of Chinese-based Asian English,CCbAE)。这是一个由六个基于中文的英语变体(中国大陆英语、中国香港英语、中国台湾英语、中国澳门英语、新加坡英语、马来西亚英语)组合而成的大规模语料库。首先,简要说明了系统的总体架构和数据库构建。其次,结合 Web 可视化界面着重介绍了语料库的六大功能,分别为词频统计、特征展示、词汇变异、形态变异、句法变异、词义变异。该系统的设计与实现为不同层次的用户体,提供简捷易用的基于中文的亚洲英语语料检索服务。

Abstract:: The Chinese - based Asian English community is the carrier of Chinese culture and one of the basic lingua francas of a community with a shared future for mankind. However,the lack of a large amount of real and credible data,scientific data mining and natural language processing methods has become a key technical problem restricting the development of Chinese-based Asian English re-search. Based on analyzing the status quo of relevant research,a large data-driven Chinese-based Asian English corpus is designed and implemented,and online retrieval service (Corpus of Chinese-based Asian English,CCbAE) is realized through Web development. This is a large-scale corpus composed of six Chinese-based English variants (Mainland China English,Hong Kong English,Taiwan English,Macau English,Singapore English,and Malaysian English).Firstly,the general architecture and database construction of the system are briefly explained. Secondly,the six major functions of the corpus are introduced in combination with the Web visual interface,which are word frequency statistics,feature display,lexical variation,morphological variation,syntactic variation,and the variation of meaning. The design and implementation of this system provides a simple and easy-to-use the Corpus of Chinese-based Asian English retrieval service for users of different levels.

相似文献/References:

[1]刘粤钳姚红玉.一类基于平行语料统计的汉法机译解决方案[J].计算机技术与发展,2008,(04):114.
　LIU Yue-qian,YAO Hong-yu.A Novel Solution to Chinese-French Machine Translation Based on Aligned Corpus[J].,2008,(11):114.
[2]魏凯斌冉延平余牛.语义相似度的计算方法研究与分析[J].计算机技术与发展,2010,(07):102.
　WEI Kai-bin,RAN Yan-ping,YU Niu.The Research and Analysis of Computing Methods on Semantic Similarity[J].,2010,(11):102.
[3]孙超张仰森.面向综合语言知识库的知识融合与获取研究[J].计算机技术与发展,2010,(08):25.
　SUN Chao,ZHANG Yang-sen.Research of Knowledge Integration and Obtaining Oriented Comprehensive Language Knowledge System[J].,2010,(11):25.
[4]吴琴霞高峰刘永革.基于XML语言甲骨文语料库元数据抽取的研究[J].计算机技术与发展,2012,(05):216.
　WU Qin-xia,GAO Feng,LIU Yong-ge.Study of Oracle Bone Inscriptions Corpus Metadata Extraction Based on XML[J].,2012,(11):216.
[5]文必龙,段炼,汪志群,等. 基于语料库和规则库的石油本体自动构建研究[J].计算机技术与发展,2015,25(09):209.
　WEN Bi-long,DUAN Lian,WANG Zhi-qun,et al. Research on Automatic Construction of Petroleum Domain Ontology Based on Corpus and Rule Base[J].,2015,25(11):209.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed260
全文下载/Downloads176
评论/Comments