«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2023. 08. 002]
点击复制

利用衍生特征预测新冠疫情的随机森林方法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 33
期数:: 2023年08期

页码:: 9-13

栏目:: 大数据与云计算

出版日期:: 2023-08-10

文章信息/Info

Title:: Random Forest Model of Predicting Covid-19 with Derived Feature

文章编号:: 1673-629X(2023)08-0009-05

作者:: 龙铁; 付宇笙; 王文达; 费宁; 南京邮电大学计算机学院,江苏南京 210003

Author(s):: LONG Tie; FU Yu-sheng; WANG Wen-da; FEI Ning; School of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210003,China

关键词:: 新型疫情; 机器学习; 随机森林; 衍生特征; 回归树

Keywords:: Covid-19; machine learning; random forest; derived feature; regression tree

分类号:: TP181

DOI:: 10. 3969 / j. issn. 1673-629X. 2023. 08. 002

摘要:: 新冠疫情爆发以来,许多研究运用时滞动力学模型、传播动力学模型和机器学习模型对疫情进行分析,取得了一定的效果。然而由于不同国家和地区之间发展差异较大,数据不均衡,导致算法普适性较弱。随机森林( Random Forest)是一种基于决策树或回归树的集成学习模型,由多个 Bagging 集成学习技术训练得到的决策树或回归树投票来获得最终的结果。在分析数据集特性的基础上,该文将原本难以体现样本差异性的特征值进行变换和组合,衍生出新的特征值,并且根据新增特征值将原有数据进行分组。采用随机森林构建疫情预测模型,对各个分组数据集分别进行训练和预测。在随机森林模型中的实验表明,该方法能够有效提高新冠疫情预测准确率,对原本差异显著地区具备更好的适应性,同时很好地防止机器学习过拟合,能较好容忍噪声值和离群值,也给未来类似传染性疾病的预测提供了新的思路。

Abstract:: Since the outbreak of Covid - 19, many works have adopted time series model, transmission dynamics model and machinelearning model to analyze?
the epidemic data, and have achieved certain results. However, due to the large development differencesbetween different countries and regions and?
the uneven data,the universality of the algorithm is weak. Random forest is an ensemblelearning model based on decision trees or regression trees,
which obtains the final result by voting of decision tree or regression treetrained by multiple Bagging ensemble learning techniques. On the basis of analyzing the characteristics of the data set,we transform andcombine the eigenvalues that are difficult to represent the differences of the samples,derives new eigenvalues, and groups the original dataaccording to the new eigenvalues. Random forest is used to construct epidemic prediction model,and each grouped data set is trained andpredicted respectively. Experiments in the random forest model show that the proposed method can effectively improve the predictionaccuracy of the Covid-19,have better adaptability to areas with significant differences,and prevent machine learning overfitting,bettertolerance of noise and outlier values,and provide new ideas for the prediction of similar infectious diseases in the future.

相似文献/References:

[1]陈全赵文辉李洁江雨燕.选择性集成学习算法的研究[J].计算机技术与发展,2010,(02):87.
　CHEN Quan,ZHAO Wen-hui,LI Jie,et al.Research of Selective Ensemble Learning Algorithm[J].,2010,(08):87.
[2]黄秀丽王蔚.SVM在非平衡数据集中的应用[J].计算机技术与发展,2009,(06):190.
　HUANG Xiu-li,WANG Wei.Application of SVM in Imbalances Dataset[J].,2009,(08):190.
[3]鲁晓南接标.一种基于个性化邮件特征的反垃圾邮件系统[J].计算机技术与发展,2009,(08):155.
　LU Xiao-nan,JIE Biao.An Individual Anti- Spam Technology[J].,2009,(08):155.
[4]张苗张德贤.多类支持向量机文本分类方法[J].计算机技术与发展,2008,(03):139.
　ZHANG Miao,ZHANG De-xian.Research on Text Categorization Based on. M- SVMs[J].,2008,(08):139.
[5]汤萍萍王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
　TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].,2008,(08):142.
[6]杨雪洁赵姝张燕平.基于商空间理论的冬小麦产量预测和分析[J].计算机技术与发展,2008,(03):249.
　YANG Xue-jie,ZHAO Shu,ZHANG Yan-ping.Analysis on Winter Wheat Yield Based on Quotient Space Theory[J].,2008,(08):249.
[7]汤伟程家兴纪霞.一种基于概率推理的邮件过滤系统的研究与设计[J].计算机技术与发展,2008,(08):76.
　TANG Wei,CHENG Jia-xing,JI Xia.Research and Design of a Spam Filtering System Based on Probability Inference[J].,2008,(08):76.
[8]孙海虹丁华福.基于模糊粗糙集的Web文本分类[J].计算机技术与发展,2010,(07):21.
　SUN Hai-hong,DING Hua-fu.Web Document Classification Based on Fuzzy-Rough Set[J].,2010,(08):21.
[9]汤伟程家兴纪霞.统计学理论在邮件分类中的应用研究[J].计算机技术与发展,2008,(12):231.
　TANG Wei,CHENG Jia-xing,JI Xia.Research and Design of a Spam Filtering System Based on Statistical Learning Theory[J].,2008,(08):231.
[10]张高胤谭成翔汪海航.基于K-近邻算法的网页自动分类系统的研究及实现[J].计算机技术与发展,2007,(01):21.
　ZHANG Gao-yin,TAN Cheng-xiang,WANG Hai-hang.Design and Implementation of Web Page Automation Classification System Based on K- Nearest Neighbor Algorithm[J].,2007,(08):21.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1065
全文下载/Downloads799
评论/Comments