下载此文档

机器学习工具WEKA的使用总结,包括算法选择、属性选择、参数优化.docx


文档分类:IT计算机 | 页数:约13页 举报非法文档有奖
1/13
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/13 下载此文档
文档列表 文档介绍
一、属性选择:
1、理论知识:
见以下两篇文章:
数据挖掘中的特征选择算法综述及基于WEKA的性能比较_陈良龙
数据挖掘中约简技术与属性选择的研究_刘辉
2、weka中的属性选择
(attribute evaluator)
总的可分为filter和wrapper方法,前者注重对单个属性进行评价,后者侧重对特征子集进行评价。
Wrapper方法有:CfsSubsetEval
Filter方法有:CorrelationAttributeEval
Wrapper方法:
(1)CfsSubsetEval
根据属性子集中每一个特征的预测能力以及它们之间的关联性进行评估,单个特征预测能力强且特征子集内的相关性低的子集表现好。
Evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between of features that are highly correlated with the class while having low intercorrelation are preferred.
For more information see:
M. A. Hall (1998). Correlation-based Feature Subset Selection for Machine Learning. Hamilton, New Zealand.
(2)WrapperSubsetEval
Wrapper方法中,用后续的学****算法嵌入到特征选择过程中,通过测试特征
子集在此算法上的预测性能来决定其优劣,而极少关注特征子集中每个特征的预测性能。因此,并不要求最优特征子集中的每个特征都是最优的。
Evaluates attribute sets by using a learning scheme. Cross validation is used to estimate the accuracy of the learning scheme for a set of attributes.
For more information see:
Ron Kohavi, e H. John (1997). Wrappers for feature subset selection. Artificial Intelligence. 97(1-2):273-324.
Filter方法:
如果选用此评价策略,则搜索策略必须用Ranker。
(1)CorrelationAttributeEval
根据单个属性和类别的相关性进行选择。
Evaluates the worth of an attribute by measuring the correlation (Pearson's) between it and the class.
Nominal attributes are considered on a value by value basis by treating each value as an indicator. An overall correlation for a nominal attribute is arrived at via a weighted average.
(2)GainRatioAttributeEval
根据信息增益比选择属性。
Evaluates the worth of an attribute by measuring the gain ratio with respect to the class.
GainR(Class, Attribute) = (H(Class) - H(Class | Attribute)) / H(Attribute).
(3)InfoGainAttributeEval
根据信息增益选择属性。
Evaluates the worth of an attribute by measuring the information gain with respect to the class.
InfoGain(Class,Attribute) = H(Class) - H(Class | Attribute).
(4)OneRAttributeEval
根据OneR分类器评估属性。
Clas

机器学习工具WEKA的使用总结,包括算法选择、属性选择、参数优化 来自淘豆网www.taodocs.com转载请标明出处.

非法内容举报中心
文档信息