Robust Prediction of Stenosis from Protein Expression Data
In this talk, Dr. David Kepplinger will describe the detrimental effects of “data-artifacts,” specifically as they relate to biomarker discovery and related feature selection techniques. He will also discuss a novel method for reliably identifying relevant biomarkers in the presence of such artifacts. This new method harnesses as much information as possible from the data and does not require prior specification of the form or source of the artifacts. According to Dr. Kepplinger, the method is proving to be more accurate than others currently in use. He will demonstrate how he used this method in a proteomic biomarker discovery study.
Increasingly affordable high-throughput proteomics and genome sequencing have led to an abundance of data, which, in turn, has resulted in numerous studies to find new biomarkers for disease. Extrapolating meaningful results can be challenging, however. Many biomarker studies feature small sample sizes from often heterogeneous populations, with hundreds or even thousands of sequenced genes. This not only leads to a very large pool of candidate biomarkers, but it also introduces a high risk for outliers and other forms of contamination that can lead to spurious discoveries.
Dr. David Kepplinger is an assistant professor in the Department of Statistics at George Mason University. His research agenda centers on finding robust statistical solutions that can be translated into practical applications in biomedical science. In particular, Dr. Kepplinger is examining new ways of minimizing adverse contamination, or outliers found in data, to improve predictive models of disease.