[1]张际灿,姚锟彬,薛磊*,等.基于数据依赖的跨架构二进制代码相似性分析[J].计算机技术与发展,2024,34(07):62-68.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0103]
 ZHANG Ji-can,YAO Kun-bin,XUE Lei*,et al.Cross-architecture Binary Code Similarity Analysis Based on Data Dependencies[J].,2024,34(07):62-68.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0103]
点击复制

基于数据依赖的跨架构二进制代码相似性分析

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年07期
页码:
62-68
栏目:
软件技术与工程
出版日期:
2024-07-10

文章信息/Info

Title:
Cross-architecture Binary Code Similarity Analysis Based on Data Dependencies
文章编号:
1673-629X(2024)07-0062-07
作者:
张际灿1姚锟彬2薛磊2*王晨1聂黎明34
1. 武汉邮电科学研究院,湖北 武汉 430074; 2. 中山大学·深圳,广东 深圳 518107; 3. 浙江理工大学,浙江 杭州 310018; 4. 南洋理工大学,新加坡 699010
Author(s):
ZHANG Ji-can1YAO Kun-bin2XUE Lei2*WANG Chen1NIE Li-ming34
1. Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074,China; 2. Shenzhen Campus of Sun Yat-sen University,Shenzhen 518107,China; 3. Zhejiang Sci-Tech University,Hangzhou 310018,China; 4. Nanyang Technological University,Singapore 699010,Singapore
关键词:
二进制数据依赖相似性检测图神经网络语义信息漏洞检测
Keywords:
binarydata dependencysimilarity detectiongraph neural networksemantic informationvulnerability detection
分类号:
TP302
DOI:
10.20165/j.cnki.ISSN1673-629X.2024.0103
摘要:
二进制代码相似性检测(Binary Code Similarity Detection,BCSD)技术在逆向工程、漏洞检测、恶意软件检测、软件抄袭以及补丁分析等学术应用领域发挥着重要作用。 大多数研究已经集中在对二进制函数进行控制流嵌入和基于自然语言处理(Natural Language Processing,NLP)技术的底层代码嵌入技术的研究之中。 然而,需要指出的是,函数在实际运行中不仅包含控制流信息,还包括数据流语义信息。 因此,如何全面抽象函数的语义特征显得尤为关键。 为此,该文提出了 BS-DD 模型,这是一个融合了控制流和数据依赖关系的二进制函数相似性判断框架。 通过模拟执行二进制代码的方法 来提取语义信息,并运用化简算法构建数据依赖关系图。 最后,借助图神经网络进行相似性判别。 对来自开源社区的 7个广泛使用的软件进行了不同组合的编译,并在此基础上设计了 3 个不同的任务场景以及真实的漏洞检测实验,用以比较 BS-DD 方法与最新基于数据流的 BCSD 方法的性能。 实验结果显示,该模型在召回率和 MRR(Mean Reciprocal Rank) 分数方面取得了显著的提高。 在真实环境的漏洞检测中,该模型也始终优于其他方法。
Abstract:
Binary Code Similarity Detection (BCSD) technology plays a pivotal role in various academic applications such as reverse en-gineering,vulnerability detection,malware analysis,software plagiarism,and patch analysis. Most research efforts have predominantly focused on control-flow embedding of binary functions and the exploration of underlying code embedding techniques utilizing Natural Language Processing (NLP) technology. However,it is worth noting that functions encompass not only control-flow information but also data-flow semantic information during their actual execution. Consequently,achieving a comprehensive abstraction of the semantic features of functions becomes crucial. In light of this,we introduce BS-DD,a framework for assessing binary function similarity that in-tegrates both control flow and data dependency relationships. We extract semantic information by simulating the execution of binary code and employ a simplification algorithm to construct a data dependency graph. Finally,we leverage graph neural networks for similarity as-sessment. We compile seven widely used software packages from the open-source community in various combinations and design three distinct task scenarios,including real-world vulnerability detection experiments,to compare the performance of the BS-DD approach with the latest data-flow-based BCSD methods. Experimental results demonstrate significant improvements in recall and Mean Reciprocal Rank (MRR) scores for such model. In real-world vulnerability detection scenarios,such model consistently outperforms other methods.

相似文献/References:

[1]张磊 殷世民 程家兴.计算机中数制转换方法[J].计算机技术与发展,2006,(11):106.
 ZHANG Lei,YIN Shi-min,CHENG Jia-xing.The Method of Numbering System Conversion in Computer[J].,2006,(07):106.
[2]戴立平,谭正华,张进修,等.一种改进的海量信息分类算法[J].计算机技术与发展,2019,29(10):201.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 039]
 DAI Li-ping,TAN Zheng-hua,ZHANG Jin-xiu,et al.An Improved Algorithm for Classification of Massive Information[J].,2019,29(07):201.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 039]
[3]罗章铭,唐 杰,黄逸奇,等.基于二进制编码的 Apriori 增量更新算法研究[J].计算机技术与发展,2022,32(01):47.[doi:10. 3969 / j. issn. 1673-629X. 2022. 01. 009]
 LUO Zhang-ming,TANG Jie,HUANG Yi-qi,et al.Research on Apriori Incremental Update Improved AlgorithmBased on Binary Code[J].,2022,32(07):47.[doi:10. 3969 / j. issn. 1673-629X. 2022. 01. 009]
[4]陈建荣.求解 0-1 背包问题的改进二进制捕鱼算法[J].计算机技术与发展,2023,33(05):187.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 028]
 CHEN Jian-rong.An Improved Binary Fishing Algorithm for 0-1 Knapsack Problem[J].,2023,33(07):187.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 028]

更新日期/Last Update: 2024-07-10