多级过滤与分区均衡的图相似搜索研究-《计算机技术与发展》

文章信息/Info

Title:: Research on Graph Similarity Search with Multi-level Filter and Partition Balance

文章编号:: 1673-629X(2025)07-0024-08

作者:: 赵旭1; 2; 梁平1; 2; 顾进广1; 2; 高峰1; 2; 1. 武汉科技大学计算机科学与技术学院,湖北武汉 430065;
2. 智能信息处理与实时工业系统湖北省重点实验室,湖北武汉 430065

Author(s):: ZHAO Xu1; 2; LIANG Ping1; 2; GU Jin-guang1; 2; GAO Feng1; 2; 1. School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,China;
2. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System,Wuhan 430065,China

关键词:: 图相似搜索; 图编辑距离; 多级过滤; 分区索引; 分区均衡性

Keywords:: graph similarity search; graph edit distance; multi-level filter; partition index; partition balance

分类号:: TP311

DOI:: 10.20165/j.cnki.ISSN1673-629X.2025.0073

摘要:: 图相似搜索指根据图编辑距离,从图数据库中搜索与查询图满足阈值要求的数据图集合的过程。由于图编辑距离的计算是 NP-hard 问题,因此目前主流的图相似搜索算法主要采用“过滤-验证”框架。而其中的过滤算法也仍然是基于分区索引的思想,存在分区过滤过程中过滤下界不紧密、候选集规模大、过滤耗时长和分区大小不均衡等问题。因此提出了对图相似搜索算法相应的改进策略。首先,在整个过滤阶段采用多级过滤的方式,按照代价由低到高的顺序,依次采用图大小、标签,顶点和邻边标签组以及分区索引进行逐级过滤,逐步筛选掉不符合阈值要求的数据图,以减少候选集生成开销;其次,给出了一种启发式方法来指导分区的初始顶点选择,以确保最终的分区尽量均衡,提高分区过滤下界的紧密性。实验结果表明,提出的改进策略不仅可以提高过滤下界的紧密性、降低分区索引构建代价及减少候选图的生成数量,而且缩短了过滤阶段时间及图相似搜索的总体时间。

Abstract:: Graph similarity search refers to the process of searching for a set of data graphs from a graph database that meet the threshold requirements with respect to the query graph based on the graph edit distance. Since the calculation of graph edit distance is an NP-hard problem,the current mainstream algorithms mainly adopt the "filtering-verification" framework. The filtering algorithm is still based on the idea of partition index,which presents issues such as loose filtering lower bound,large candidate set size,lengthy filtering times,and unbalanced partition sizes during the partition filtering process. Therefore, corresponding improvement strategies for graph similarity search algorithms are proposed. Firstly, in the whole filtering stage, a multi - level filtering approach is employed. In the order of increasing cost,graph size, labels, vertex and adjacent edge label groups, and partition index are used for step - by - step filtering successively,and data graphs that do not meet the threshold requirements are gradually filtered out to reduce the candidate set generation overhead. Secondly,a heuristic method is provided to guide the initial vertex selection for partitioning, ensuring that the final partitions are as balanced as possible and improve the tightness of the partition filtering lower bound. Experimental results show that the proposed improvement strategies not only improve the tightness of the filtering lower bound,reduce the cost of partition index construction and the number of generated candidate graphs but also shorten the time of filtering stage and the overall time of graph similarity search.

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

文章信息/Info

常用功能

导航/Navigate

工具/Tools

统计/Statistics