基于残差结构的 SSD 口罩检测-《计算机技术与发展》

文章信息/Info

Author(s):: DONG Yan-hua; ZHANG Shu-mei; ZHAO Jun-li; Qingdao University,Qingdao 266071,China

Keywords:: mask detection; residual structure SSD; classification positioning; cross-entropy loss function; smooth L1 loss function

摘要:: 新冠疫情环境下,人们外出均需佩戴口罩进行防护,所以目前对人脸口罩检测的研究迫在眉睫。该文提出一种基于残差结构的 SSD(single shot mult-box? detector) 网络用于口罩检测,通过在 SSD 网络的定位分类前添加残差结构,将特征提取网络和分类定位层进行分离,进而使得进入分类定位层的卷积特征更加抽象,可以有效解决 SSD 网络同时学习局部信息和高层信息双重任务的问题,维护特征提取网络的稳定性,并利用交叉熵损失函数解决戴口罩和未戴口罩的二分类问题,利用 smooth L1 loss 损失函数解决口罩位置的回归问题。然后将分类和位置回归做加权计算,通过优化传统的 SSD位置误差和置信度误差损失函数,实现人脸佩戴口罩特征和人脸未戴口罩特征的定位和分类,从而提高网络训练速度及检测效率。实验结果表明,ReSSD 检测口罩的平均检测精度可达 92. 3% ,比 SSD 网络提高了 7. 4% ,同时在自然场景下也有高效的检测效果。

Abstract:: In the context of COVID-19,people need to wear masks when going out for protection,so the current research on face mask detection is imminent. We propose a residual structure-based SSD ( single shot multbox detector) network for mask detection. By adding a residual structure before the positioning and classification of the SSD network,the feature extraction network is separated from the classification and positioning layer so that the convolutional features entering the classification and positioning layer are more abstract,which can effectively solve the problem of the dual task of the SSD network learning local information and high-level information at the same time,maintaining the stability of the feature extraction network. Moreover,the cross-entropy loss function is used to solve the two classifications of masks and non - masks, and the smooth L1 loss function is used to solve the regression of mask position. Then the weighted calculation of the classification and position regression is performed. By optimizing the traditional SSD position error and confidence error loss function,the features of face wearing mask and face not wearing mask can be located and classified,so as to improve the network training speed and detection efficiency. Experiment shows that the average detection accuracy of ReSSD detection masks can reach 92. 3% ,which is 7. 4% higher than the SSD network,and it also has efficient detection effects in natural scenes.