[1]焦洋,王纯,韩静茹.基于Lucene 的科研查新系统构建[J].计算机技术与发展,2018,28(05):193-196.[doi:10.3969/ j. issn.1673-629X.2018.05.043]
 JIAO Yang,WANG Chun,HAN Jing-ru.Construction of Scientific Research Management System Based on Lucene[J].,2018,28(05):193-196.[doi:10.3969/ j. issn.1673-629X.2018.05.043]
点击复制

基于Lucene 的科研查新系统构建()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
28
期数:
2018年05期
页码:
193-196
栏目:
应用开发研究
出版日期:
2018-05-10

文章信息/Info

Title:
Construction of Scientific Research Management System Based on Lucene
文章编号:
1673-629X(2018)05-0193-04
作者:
焦洋王纯韩静茹
中国民航科学技术研究院 北京市民航安全分析及预防工程技术中心,北京 100028
Author(s):
JIAO YangWANG ChunHAN Jing-ru
Civil Aviation Safety Analysis and Prevention Engineering Technology Center,China Academy of Civil Aviation Science and Technology,Beijing 100028,China
关键词:
Lucene查重领域本体科研
Keywords:
Luceneduplicate checkingdomain ontologyscientific research
分类号:
TP302.1
DOI:
10.3969/ j. issn.1673-629X.2018.05.043
文献标志码:
A
摘要:
利用Java Web 平台,以 Lucene 检索工具包为检索核心,设计实现了一套科研管理系统。 调用 Lucene 分词索引功能,检索已有申报课题库,初步过滤出相关性计算得分高于 30% 的文档;通过构建领域术语库,引入查询单词的近义词或别称,一定程度上缩小了相近词文本替换的检索盲区;提出了利用 Lucene 高亮显示二次处理的方法,设置阈值为 0. 7,将比例超过阈值的分散高亮单词进行平滑处理,使得对比查阅文档重复内容更加自然。 在此基础上,利用高亮处理后的文本结果,重新设计文档重复率的计算算法。 系统在企事业单位进行部署应用,根据系统现场运行反馈,重复率计算较准确,科研查新系统避免了项目重复申报,提高了科研经费的使用效益,为提升企事业单位科研管理水平提供了技术支撑。
Abstract:
With Java Web platform and Lucene as the retrival kernel,we design and implement a set of scientific research management system. By calling the Lucene segmentation and index function,the existing research library is retrieved in order to filter out relevant documents that score above 30%. By constructing domain terminology libraries,synonyms or nicknames for query words are introduced. This method narrows the search blind area to some degree. We propose a secondary treatment method of Lucene highlight which sets the threshold to 0.7. If the proportion of the highlight words more than the threshold,these words must be processed for smooth,which makes it natural to read the compared document. On this basis,the algorithm of document repetition rate is redesigned by using the results of highlighted text. The system is applied in many enterprises and institutions and its repetition rate calculation is roughly accurate according to the feedback,which effectively eliminates the repeated declaration project and improves the efficiency of scientific research funds,pro-
viding technical support for improving the level of scientific research work of enterprises and institutions.

相似文献/References:

[1]李永春 丁华福.Lucene的全文检索的研究与应用[J].计算机技术与发展,2010,(02):12.
 LI Yong-chun,DING Hua-fu.Research and Application of Full Text Search Based on Lucene[J].,2010,(05):12.
[2]唐华姣 何友全 徐小乐 徐澄.基于Lucene的分布式并行索引[J].计算机技术与发展,2011,(02):123.
 TANG Hua-jiao,HE You-quan,XU Xiao-le,et al.Distributed Parallel Index Based on Lucene[J].,2011,(05):123.
[3]张春燕 刘发升.关于Lucene索引工具的性能优化研究[J].计算机技术与发展,2011,(05):121.
 ZHANG Chun-yan,LIU Fa-sheng.Lucene Indexing Tools Research Based on Optimization of Performance[J].,2011,(05):121.
[4]潘政.基于快速分词的语义Web服务搜索系统设计[J].计算机技术与发展,2013,(08):107.
 PAN Zheng.Design of Semantic Web Service Search System Based on Fast Word Segmentation[J].,2013,(05):107.
[5]樊同科,谢勇.一种混合搜索算法在智能Web中的应用[J].计算机技术与发展,2013,(08):220.
 FAN Tong-ke,XIE Yong.Application of a Hybrid Search Algorithm in Intelligent Web[J].,2013,(05):220.
[6]冯亚洲,岳东. 电力视频大数据分布式检索系统设计与实现[J].计算机技术与发展,2016,26(12):186.
 FENG Ya-zhou,YUE Dong. Design and Implementation of Distributed Retrieval System for Massive Power Video[J].,2016,26(05):186.

更新日期/Last Update: 2018-07-20