下载此文档

数据挖掘课件数据挖掘02.pdf


文档分类:IT计算机 | 页数:约79页 举报非法文档有奖
1/79
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/79 下载此文档
文档列表 文档介绍
Data Mining:
Concepts and Techniques
— Chapter 2 —
Jiawei Han
Department puter Science
University of Illinois at Urbana-Champaign
/~hanj
©2006 Jiawei Han and Micheline Kamber, All rights reserved
January 19, 2011 Data Mining: Concepts and Techniques 1
January 19, 2011 Data Mining: Concepts and Techniques 2
Chapter 2: Data Preprocessing
 Why preprocess the data?
 Descriptive data summarization
 Data cleaning
 Data integration and transformation
 Data reduction
 Discretization and concept hierarchy generation
 Summary
January 19, 2011 Data Mining: Concepts and Techniques 3
Why Data Preprocessing?
 Data in the real world is dirty
 plete: lacking attribute values, lacking
certain attributes of interest, or containing
only aggregate data
 ., occupation=“”
 noisy: containing errors or outliers
 ., Salary=“-10”
 inconsistent: containing discrepancies in codes
or names
 ., Age=“42” Birthday=“03/07/1997”
 ., Was rating “1,2,3”, now rating “A, B, C”
 ., discrepancy between duplicate records
January 19, 2011 Data Mining: Concepts and Techniques 4
Why Is Data Dirty?
 plete data e from
“Not applicable” data value when collected
 Different considerations between the time when the data was
collected and when it is analyzed.
 Human/hardware/software problems
 Noisy data (incorrect values) e from
 Faulty data collection instruments
 Human puter error at data entry
 Errors in data transmission
 Inconsistent data e from
 Different data sources
 Functional dependency violation (., modify some linked data)
 Duplicate records also need data cleaning
January 19, 2011 Data Mining: Concepts and Techniques 5
Why Is Data Preprocessing Important?
 No quality data, no quality mining results!
 Quality decisions must be based on quality data
 ., duplicate or missing data may cause incorrect or even
misleading statistics.
 Data warehouse

数据挖掘课件数据挖掘02 来自淘豆网www.taodocs.com转载请标明出处.

非法内容举报中心
文档信息
  • 页数79
  • 收藏数0 收藏
  • 顶次数0
  • 上传人所以所以
  • 文件大小0 KB
  • 时间2012-02-17