[1]徐远方,李成城.基于支持向量机和约束条件的新词识别研究[J].计算机技术与发展,2014,24(01):98-101.
 XU Yuan-fang,LI Cheng-cheng.Research on New Word Identification Based on Support Vector Machine and Constraint Condition[J].,2014,24(01):98-101.
点击复制

基于支持向量机和约束条件的新词识别研究()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年01期
页码:
98-101
栏目:
智能、算法、系统工程
出版日期:
2014-01-31

文章信息/Info

Title:
Research on New Word Identification Based on Support Vector Machine and Constraint Condition
文章编号:
1673-629X(2014)01-0098-04
作者:
徐远方李成城
内蒙古师范大学 网络技术学院
Author(s):
XU Yuan-fangLI Cheng-cheng
关键词:
新词识别支持向量机约束条件核函数
Keywords:
new word identificationSVMconstraint conditionskernel function
分类号:
TP301
文献标志码:
A
摘要:
中文分词的关键技术之一在于如何正确切分新词,文中提出了一种新的识别新词的方法。借助支持向量机良好的分类性,首先对借助分词词典进行分词和词性标注过的训练语料中抽取正负样本,然后结合从训练语料中计算出的各种词本身特征进行向量化,通过支持向量机的训练得到新词分类支持向量。对含有模拟新词的测试语料进行分词和词性标注,结合提出的相关约束条件和松弛变量选取候选新词,通过与词本身特征结合进行向量化后作为输入与通过训练得到的支持向量机分类器进行计算,得到的相关结果与阈值进行比较,当结果小于阈值时判定为一个新词,而计算结果大于阈值的词为非新词。通过实验结果比较选取最合适的支持向量机核函数。
Abstract:
One of the key technologies of Chinese word segmentation is how to segment the new words correctly,present a new method a-bout the study of identification for new words. With the support of good classification of SVM,first extract the positive and negative sam-ples from training corpus which was handled by segmentation and POS tagging according to the dictionary,then combining with all kinds of words' classification which was gotten from training corpus,gain the new word support vector through the training of supporting vec-tor machine. Word segmentation and POS tagging on the test of corpus containing simulated new words,in conjunction with the relevant constraints and the slack variables are proposed to select candidate new words,as to the quantized input and support vector machine classi-fier calculate by combining with the word itself characteristics,getting the relevant results is compared with a threshold,when the result is less than the threshold determine it a new word,and when the calculation results are greater than the threshold determine it non-new word. Through the comparison of experimental results is to select the most suitable kernel function of support vector machine.

相似文献/References:

[1]李雷 张建民.一种改善的基于支持向量机的边缘检测算子[J].计算机技术与发展,2010,(03):125.
 LI Lei,ZHANG Jian-min.An Improved Edge Detector Using the Support Vector Machines[J].,2010,(01):125.
[2]陈俏 曹根牛 陈柳.支持向量机应用于大气污染物浓度预测[J].计算机技术与发展,2010,(01):247.
 CHEN Qiao,CAO Gen-niu,CHEN Liu.Application of Support Vector Machine to Atmospheric Pollution Prediction[J].,2010,(01):247.
[3]李晶 姚明海.基于支持向量机的语义图像分类研究[J].计算机技术与发展,2010,(02):75.
 LI Jing,YAO Ming-hai.Research of Semantic Image Classification Based on Support Vector Machine[J].,2010,(01):75.
[4]姜鹤 陈丽亚.SVM文本分类中一种新的特征提取方法[J].计算机技术与发展,2010,(03):17.
 JIANG He,CHEN Li-ya.A New Feature Selection Method in SVM Text Categorization[J].,2010,(01):17.
[5]曹庆璞 董淑福 罗赟骞.网络时延的混沌特性分析及预测[J].计算机技术与发展,2010,(04):43.
 CAO Qing-pu,DONG Shu-fu,LUO Yun-qian.Chaotic Analysis and Prediction of Internet Time- Delay[J].,2010,(01):43.
[6]路川 胡欣杰.区域航空市场航线客流量预测研究[J].计算机技术与发展,2010,(04):84.
 LU Chuan,HU Xin-jie.Analysis of Regional Airline Passenger Forecast Title[J].,2010,(01):84.
[7]黄炜 黄志华.一种基于遗传算法和SVM的特征选择[J].计算机技术与发展,2010,(06):21.
 HUANG Wei,HUANG Zhi-hua.Feature Selection Based on Genetic Algorithm and SVM[J].,2010,(01):21.
[8]孙秋凤.microRNA计算识别中的模式识别技术[J].计算机技术与发展,2010,(06):97.
 SUN Qiu-feng.Pattern Recognition Technology for MicroRNA Identification[J].,2010,(01):97.
[9]刘振岩 王勇 陈立平 马俊杰 陈天恩.基于SVM的农业智能决策Web服务的研究与实现[J].计算机技术与发展,2010,(06):213.
 LIU Zhen-yan,WANG Yong,CHEN Li-ping,et al.Research and Implementation of Intelligence Decision Web Services Based on SVM for Digital Agriculture[J].,2010,(01):213.
[10]王李冬.一种新的人脸识别算法[J].计算机技术与发展,2009,(05):147.
 WANG Li-dong.A New Algorithm of Face Recognition[J].,2009,(01):147.
[11]徐远方 李成城.基于SVM和词间特征的新词识别研究[J].计算机技术与发展,2012,(05):134.
 XU Yuan-fang,LI Cheng-cheng.Research on New Word Identification Based on SVM and Word Characteristics[J].,2012,(01):134.

更新日期/Last Update: 1900-01-01