《人工智能与数据挖掘教学ppt课件 》.ppt

1/26

• 1.该资料是网友上传的，本站提供全文预览，预览什么样，下载就什么样。
• 2.下载该文档所得收入归上传者、原创者。
• 3.下载的文档，不会出现我们的网址水印。
1/26 下载此文档

*
AI&DM
*
Chapter 3 Basic Data Mining Techniques
Decision Trees
(For classification)

Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-conquer manner
At start, all the training examples are at the root
Attributes are categorical (if continuous-valued, they are discretized in advance)
Examples are partitioned recursively based on selected attributes
Test attributes are selected on the basis of a heuristic or statistical measure (., information gain)
Conditions for stopping partitioning
All samples for a given node belong to the same class
There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf
There are no samples left
Reach the pre-set accuracy

*
AI&DM
*
Information Gain （信息增益）(ID3/)
Select the attribute with the highest information gain
Assume there are two classes, P and N
Let the set of examples S contain p elements of class P and n elements of class N
The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as

*
AI&DM
*
Information Gain in Decision Tree Building
Assume that using attribute A， a set S will be partitioned into sets {S1, S2 , …, Sv}
If Si contains pi examples of P and ni examples of N, the entropy (熵), or the expected information needed to classify objects in all subsets Si is
The encoding information that would be gained by branching on A

*
AI&DM
*
Attribute Selection by Information Gain Computation
Class P:
Class N:
I(p, n) = I(9, 5) =
Compute the entropy for age:
Hence
Similarly
= -=

《人工智能与数据挖掘教学ppt课件 》 来自淘豆网www.taodocs.com转载请标明出处.

• 页数26
• 收藏数0 收藏
• 顶次数0
• 上传人mkjafow
• 文件大小232 KB
• 时间2022-08-07