Skip to main content
Home

Search form

  • Home
  • Calendar
  • People
    • Key Personnel
    • Members
    • Collaborators
  • Grants
  • Papers
  • Blogs
  • Wiki
  • Log In
Home / A decision-theory approach to interpretable set analysis for high-dimensional data.

A decision-theory approach to interpretable set analysis for high-dimensional data.

TitleA decision-theory approach to interpretable set analysis for high-dimensional data.
Publication TypeJournal Article
Year of Publication2013
AuthorsBoca SM, Bravo HCéorrada, Caffo B, Leek JT, Parmigiani G
JournalBiometrics
Volume69
Issue3
Pagination614-23
Date Published2013 Sep
ISSN1541-0420
KeywordsAlgorithms, Bayes Theorem, Biometry, Brain, Computer Simulation, Data Interpretation, Statistical, Decision Theory, Functional Neuroimaging, Gene Expression Profiling, Genomics, Humans, Magnetic Resonance Imaging, Models, Statistical, Oligonucleotide Array Sequence Analysis
Abstract

A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses.

DOI10.1111/biom.12060
Alternate JournalBiometrics
PubMed ID23909925
PubMed Central IDPMC3927844
Grant List3T32GM074906-04S1 / GM / NIGMS NIH HHS / United States
R01 EB012547 / EB / NIBIB NIH HHS / United States
ZIA CP010181-12 / CP / NCI NIH HHS / United States
  • Google Scholar
  • BibTeX

Navigation

  • Statistical methods
    • General
    • Causal Inference
    • Population ICA
    • PVD
    • Testing
    • Prediction / Machine Learning
    • Computation
    • Visualization
    • Structural PCA
  • Scientific areas of interest
    • Brain imaging - Variability
    • Brain Imaging - Prediction
    • Brain Imaging - Clinical
    • Wearable Computing
    • Biosignals
  • Software & Tutorials
  • Social media
  • Logos
© 2012 smart-stats.org