Evaluation of Techniques for Classifying Biological Sequences.ppt


文档分类:高等教育 | 页数:约41页 举报非法文档有奖
1/ 41
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/ 41
文档列表 文档介绍
Evaluation of Techniques for Classifying Biological Sequences
Authors: Mukund Deshpande and e Karypis
Speaker: Sarah Chan
CSIS DB Seminar
May 31, 2002
Presentation Outline
Introduction
Traditional Approaches (kNN, Markov Models) to Sequence Classification
Feature Based Sequence Classification
Experimental Evaluation
Conclusions
Introduction
The amount of biological sequences available in public databases is increasing exponentially
GenBank: 16 billion DNA base-pairs
PIR: over 230,000 protein sequences
Strong sequence similarity often translates to functional and structural relations
Classification algorithms applied on sequence data can be used to gain valuable insights on functions and relations of sequences
. to assign a protein sequence to a protein family
Introduction
K-nearest neighbor, Markov models and Hidden Markov models have been extensively used
They have considered the sequential constraints present in datasets
Motivation: Few attempts to use traditional machine learning classification algorithms such as decision trees and support vector machines
They were thought of not being able to model sequential nature of datasets
Focus of This Paper
To evaluate some widely used sequence classification algorithms
K-nearest neighbor
Markov models
To develop a framework to model sequences such that traditional machine learning algorithms can be easily applied
Represent each sequence as a vector in a derived feature space, and then use SVMs to build a sequence classifier
Problem Definition - Sequence Classification
A sequence Sr = {x1, x2, x3, .. xl} is an ordered list of symbols
The alphabet  for symbols: known in advance and of fixed size N
Each sequence Sr has a class label Cr
Assumption: Two class labels only (C+, C-)
Goal: To correctly assign a class label to a test sequence
Approach 1: K Nearest Neighbor (KNN) Classifiers
To classify a test sequence Sr
Locate K training sequences being most similar to Sr
Assign to Sr the class label which oc

Evaluation of Techniques for Classifying Biological Sequences 来自淘豆网www.taodocs.com转载请标明出处.

非法内容举报中心
文档信息
  • 页数 41
  • 收藏数 0 收藏
  • 顶次数 0
  • 上传人 中国课件站
  • 文件大小 0 KB
  • 时间2011-12-04
最近更新