*
AI&DM
*
Chapter 3 Basic Data Mining Techniques
Decision Trees
(For classification)
低颧悼洼需陷未垄焚枣超篓茂般嘉繁波畔顾骂旋凌巫纤m for Decision Tree Building
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-conquer manner
At start, all the training examples are at the root
Attributes are categorical (if continuous-valued, they are discretized in advance)
Examples are partitioned recursively based on selected attributes
Test attributes are selected on the basis of a heuristic or statistical measure (., information gain)
Conditions for stopping partitioning
All samples for a given node belong to the same class
There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf
There are no samples left
Reach the pre-set accuracy
畴畅撕拔抚赚蚜混敛蔷锡究谷湘缸毛提滨煮嫡苔庭贴奉伴拽喘廓灵押没卉《人工智能与数据挖掘教学课件》lect-3-12《人工智能与数据挖掘教学课件》lect-3-12
*
AI&DM
*
Information Gain (信息增益)(ID3/)
Select the attribute with the highest information gain
Assume there are two classes, P and N
Let the set of examples S contain p elements of class P and n elements of class N
The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as
求蜡寂言保氢咕墙果蜕乔渍圾虏彬彩烯吗泞内帐暴陪耀队把硼舶宇夷赞谋《人工智能与数据挖掘教学课件》lect-3-12《人工智能与数据挖掘教学课件》lect-3-12
*
AI&DM
*
Information Gain in Decision Tree Building
Assume that using attribute A, a set S will be partitioned into sets {S1, S2 , …, Sv}
If Si contains pi examples of P and ni examples of N, the entropy (熵), or the expected information needed to classify objects in all subsets Si is
The encoding information that would be gained by branching on A
之辕溢系莲铆况弊裁靶颜峪奔肘淡嘶振蹬镶喝戊采诀淆霞颊散狡糊凑熄簿《人工智能与数据挖掘教学课件》lect-3-12《人工智能与数据挖掘教学课件》lect-3-12
*
AI&DM
*
Attribute Selection by Information Gain Computation
Class P:
buys_computer = “yes”
Class N:
buys_computer = “no”
I(p, n) = I(9, 5) =
Compute the entropy for age:
Hence
Similarly
= -=
来赣无厦簇庭锰乾凝择旬慎逆奠氢
《人工智能与数据挖掘教学ppt课件 》 来自淘豆网www.taodocs.com转载请标明出处.