Multiple Testing and Prediction and Variable Selection.ppt
Multiple Testing and Prediction and Variable Selection Class web site: http://statison/teaching/Microarrays/ Statistics for Microarrays cDNA gene expression data Data on G genes for n samples Genes mRNA samples Gene expression level of gene i in mRNA sample j = (normalized) Log( Red intensity / Green intensity) sample1 sample2 sample3 sample4 sample5 … 1 ... 2 - ... 3 ... 4 - - - - - ... 5 - - ... Multiple Testing Problem Simultaneously test G null hypotheses, one for each gene j Hj: no association between expression level of gene j and the covariate or response Because microarray experiments simultaneously monitor expression levels of thousands of genes, there is a large multiplicity issue Would like some sense of how ‘surprising’ the observed results are Hypothesis Truth vs. Decision # not rejected # rejected totals # true H U V (F +) m0 # non-true H T S m1 totals m - R R m Truth Decision Type I (False Positive) Error Rates Per-family Error Rate PFER = E(V) parison Error Rate PCER = E(V)/m Family-wise Error Rate FWER = p(V ≥ 1) False Discovery Rate FDR = E(Q), where Q = V/R if R > 0; Q = 0 if R = 0 Strong vs. Weak Control All probabilities are conditional on which hypotheses are true Strong control refers to control of the Type I error rate under bination of true and false nulls Weak control refers to control of the Type I error rate only under plete null hypothesis (. all nulls true) In general, weak control without other safeguards is unsatisfactory Comparison of Type I Error Rates In general, for a given multiple testing procedure, PCER FWER PFER, and FDR FWER, with FDR = FWER under plete null Adjusted p-values (p*) If interest is in controlling, ., the FWER, the adjusted p-value for hypothesis Hj is: pj* = inf {: Hj is rejected at FWER } Hypothesis Hj is rejected at FWER if pj* Adju
Multiple Testing and Prediction and Variable Selection 来自淘豆网www.taodocs.com转载请标明出处.